JRC Data Validation tool for Data Calls collected under
Transcription
JRC Data Validation tool for Data Calls collected under
JRC Data Validation tool for Data Calls collected under the EU Data Collection Framework DV Tool 4.14 User Manual JRC G.04 Data Collection Team EUR xxxxx EN - 2013 The mission of the JRC-IPSC is to provide research results and to support EU policy-makers in their effort towards global security and towards protection of European citizens from accidents, deliberate attacks, fraud and illegal actions against EU policies. European Commission Joint Research Centre Institute for the Protection and Security of the Citizen Contact information Address: Via Fermi 2749, T.P. 051 – Ispra (VA), 21027 - Italy E-mail: [email protected] Tel.: 0332 786479 Fax: 0332 789658 http://ipsc.jrc.ec.europa.eu/ http://www.jrc.ec.europa.eu/ Legal Notice Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed. A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server http://europa.eu/ JRC [PUBSY request] EUR XXXXX LL ISBN X-XXXX-XXXX-X ISSN XXXX-XXXX doi:XXXXX Luxembourg: Publications Office of the European Union © European Union, 2013 Reproduction is authorised provided the source is acknowledged Printed in Italy DV Tool 4.14 User Manual TABLE OF CONTENTS 1. Introduction ............................................................................................................................. 3 2. DV tool and Microsoft Office Excel versions ............................................................................. 4 3. How to use the DV tool features .............................................................................................. 4 3.1. 3.2. AT OPENING ............................................................................................................................................... 4 DV TOOL CONTROL PANEL........................................................................................................................... 5 3.2.1. Data Check area ......................................................................................................................... 6 3.2.1.1. 3.2.1.2. 3.2.1.3. 3.2.1.4. 3.2.1.5. 3.2.1.6. 3.2.2. Step 1: add template ........................................................................................................................... 6 Step 2: paste data into the template..................................................................................................... 8 Step 3: check codification .................................................................................................................... 9 Step 4: check duplication ................................................................................................................... 10 Step 5: check correspondences ......................................................................................................... 11 Step 6: export data ............................................................................................................................ 12 Support area ............................................................................................................................. 14 4. Important Notes .................................................................................................................... 14 5. Disclaimer of warranty ........................................................................................................... 15 1 DV Tool 4.14 User Manual LIST OF FIGURES Figure 3.1: The DV tool button in Excel’s Home ribbon. ....................................................................................... 4 Figure 3.2: The DV tool control panel. .................................................................................................................. 5 Figure 3.3: Buttons of the ‘support area’. .............................................................................................................. 6 Figure 3.4: Selection template window for the Fishing Effort Regime data call ..................................................... 7 Figure 3.5: Selection template window for the Fleet Economic data call. ............................................................. 7 Figure 3.6: Import of data from a file with comma separator as delimiter. ............................................................. 8 Figure 3.7: Prompt message during import process. ............................................................................................ 9 Figure 3.8: Check Codification process executed on the selected (active) worksheet. ......................................... 9 Figure 3.10: Duplication check panel. ................................................................................................................. 10 Figure 3.11: Duplication check example: row 5 is a duplication of row 4 and row 12 is equal to row 11............. 11 Figure 3.12: Correspondence check panel. ........................................................................................................ 11 Figure 3.13: Correspondence check example (Effort data call). ......................................................................... 11 Figure 3.14: Option to split the file to smaller size files. ...................................................................................... 12 Figure 3.15: Choose to split the data by year...................................................................................................... 12 Figure 3.16: Successful export message. ........................................................................................................... 13 Figure 3.17: Example of exported data in the created directory (export by year). ............................................... 13 2 DV Tool 4.14 User Manual 1. Introduction The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and embedded in a specifically designed Excel Workbook. The main purpose of this tool is to facilitate and support the Member States in uploading data to meet the requirements defined by DG MARE in official DCF data calls by STECF (Council Regulation 199/2008). The use of this tool is not mandatory. However, the data validations checks performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the uploading procedure. The tool is capable of checking national data stored in Excel rows against certain codifications and rules as requested in the data call. Checks can be performed on partial or whole data sets to be uploaded to the DCF web site. The majority of the checks concern the use of valid codes as defined in the data call and the type of the data entered (numeric or text). Erroneous data are identified and can easily be corrected. The DV tool is design to check data that are provided in predefined worksheets depending on the type of data call. Each data call has a dedicated DV tool. The first version of the DV tool was developed to serve the needs of the Fishing Effort Regimes and the Mediterranean & Black Sea data calls. The DV tool was a set of excel files one for each set of data type requested by the data call. A second version was produced for the Fleet Economic data call in 2013. It contained improvements based on experience acquired with the previous data calls: a single excel file containing all the data templates and with the possibility to automatically update, directly from the database, the codes used by the DV tool for the validation of the data. It was realised that ideally the JRC should develop a DV tool that could serve different data calls while remaining easy to maintain. Therefore the original DV tool, developed by Nikolaos Mitrakis in 2011, has been re-engineered and re-designed from scratch to allow the flexibility and maintainability desired. However, the tool is still under development and any feedback from users that could help improve the tool according to their needs is most welcomed. Any changes in the data call imply coherent adaptation of the modules. The purpose of this manual is to provide the general concept surrounding the tool and guidelines for its use. Since the tool is integrated in Excel Workbooks, all Excel functions are still available. The tool is available for download from the Data Collection Framework web site: https://datacollection.jrc.ec.europa.eu 3 DV Tool 4.14 User Manual 2. DV tool and Microsoft Office Excel versions To benefit from the DV tool functionalities it is necessary to run it on a Windows operating system where Microsoft Excel software is installed. The DV tool can be used within Microsoft Excel 2003, 2007, 2010 and 2013; either 32bits or 64bits. 3. How to use the DV tool features The DV tool stands for Data Validation tool. In this chapter are explained the functionalities and the possible steps to be performed in order to create final excel data files that can be uploaded to the ‘Upload Facility’ available at the Data Collection web site. 3.1. At opening When opening the DV tool file with Microsoft Office 2007/2010 (see below for Microsoft Office 2003), a ‘Security Warning’ may appear with the message: “Macros have been disabled. Options…” This is a standard security option in all Microsoft’s applications, since macros can potentially have access to your data. In order to continue with the use of the tool, the user should choose: “OptionsEnable this content”. To change the security settings go to “DeveloperCodeMacro SecurityDisable” all macros with notification. However, we recommend that the default option not be changed (although the DV tool is a trusted application, other macros can potentially cause several problems). Once the Macros are enabled the DV Tool ribbon is visible in the Home tab of the Excel application. Figure 3.1: The DV tool button in Excel’s Home ribbon. In order to use the DV tool with MS Office 2003 version it is necessary to download and install (if not already installed) “Microsoft’s Office Compatibility Pack for Word, Excel and PowerPoint File Formats” which is available for free at: http://www.microsoft.com/en-us/download/details.aspx?id=3 Attention should be given to the restriction of a maximum number of 65535 rows (the first row is always the header). However, this is the theoretical limit of rows in Excel 2003. In practise, and especially 4 DV Tool 4.14 User Manual when dealing with large files, large number of rows can cause significant computational time or even an Excel not responding status. Users are strongly advised to use Excel 2007 when dealing with more than 20000 rows especially when dealing with the catch data table (sheet name EFF_01_CATCH). One other option is first to manually split the data set into smaller sizes (e.g. one sheet per year of data) and then insert or paste to the DV tool. Apply the DV tool separately for each one of these smaller data sets. The environment is the same except for the option to recall and show the control panel from the Office ribbon since this is not available for pre-Office 2007 versions. In order to re-show the DV control panel, simply use this key combination: Ctrl + J or just save and re-open the template. To change the security settings for the Macros go to Tools Macro Security Security Level Medium. 3.2. DV tool control panel The DV tool control panel is the first and the main window that pops-up when opening the Workbook (Figure 3.2). Here, users can access all the functions that the DV tool provides. Figure 3.2: The DV tool control panel. 5 DV Tool 4.14 User Manual The panel window consists of two main areas: (1) a ‘Data Checking’ area and (2) a ‘Support’ area. The ‘Data Check’ area is made of step-buttons (see the big coloured buttons in Figure 3.2) that can be performed on a single worksheet or on all worksheets at once. While the blue buttons act on the worksheet as a whole (create, export, import…), the other buttons act on the data contained in the worksheet(s) using the corresponding colour to identify the type of error-check. The ‘Support’ area is made of smaller buttons visible at the top right corner of the control panel. These buttons can be used to manipulate the results of the checks and to request help. Figure 3.3: Buttons of the ‘support area’. 3.2.1. Data Check area Users are strongly advised to follow each step in order to guarantee the appropriate data checks. The available buttons include: Step 1 Add a template structure Step 2 Import Data Step 3 Codification check = checks if worksheets conform to the data call template definition Step 4 Duplication check = check for duplicated rows, the columns involved in this process are established by the template definition Step 5 Correspondence check = checks if cells of predefined columns on the same row contain valid combination values Step 6 Export Data = exports the data in the workbook to files ready to be uploaded to the DCF web site Step1 and Step2 are not necessary if the user intends to import existing excel worksheets by the use of excel functionalities. 3.2.1.1. Step 1: add template In order to upload the DCF data into the ‘Upload facility’ of the Data Collection web site the user needs to prepare the data in a specific format. This format can be different from data call to data call because of requests for a different set of data aggregated in different levels. The specific format is defined by templates; these templates are excel worksheet with pre-defined headings. By clicking on the ‘add worksheet template’ a pop-up window exposes the list of templates valid for the specific data call. The user needs to click one or more check-boxes in correspondence to the desired 6 DV Tool 4.14 User Manual template(s) in order to create one or more formatted worksheet(s). When pressed, the ‘Execute’ button will close the window and add into the workbook as many worksheets as there were templates selected. Figure 3.4: Selection template window for the Fishing Effort Regime data call Figure 3.5: Selection template window for the Fleet Economic data call. 7 DV Tool 4.14 User Manual Once the worksheets are created they can be filled with data; it is possible to copy and paste them from an external source (excel sheet or text file) or continue with step 2. Important: For big quantities of data (for example: ‘catch’ data for Fishing Effort Regime data call or ‘landings’ data for the Fleet Economic data call) it is recommended to open and work with only the one template. For the other templates use the DV tool in another excel file. 3.2.1.2. Step 2: paste data into the template In order to populate the worksheets with data and check whether the data meet the requirements of the data call the first thing to do is to fill up the empty cells with the data. The user has three options: type the data one by one, use the excel functionalities to import them from another source (i.e. excel worksheet, text files), press the ‘Import’ button on the control panel (step 2). The ‘Import data’ feature offered by the DV tool has the advantage that it imports the data just below the header row so that the user can benefit from the format definition and compare the headings with the headings (if they exist) of the external source. Several imports can be executed on the same worksheet building up a set of data coming from different sources. The possible external sources can be of two types: .csv files (comma separated values), .txt files (text files). First, select the worksheet template by clicking on any cell (note: cell format is set to ‘General’, i.e. no specific number format). Second, select the character separator used by the external source to separate the data. Third, press the ‘Import data’ button and navigate to select the external source. When you click on the ‘Import data’ button, the default Dialog Window will pop up for selecting the file to import. After selecting the file click the ‘OK’ button and the application will execute the importing procedure if the separator is recognised among the external source data. Figure 3.6: Import of data from a file with comma separator as delimiter. A message window is shown to inform the user about the import process. Once the data is imported please check carefully that the imported data correspond to the headings; otherwise the worksheet will never pass the validation checks. During this process the system may prompt the user with messages; in these cases the user needs to confirm and continue with the process. One possible message could be the one represented in Figure 3.7 when working in Microsoft Excel 2013. 8 DV Tool 4.14 User Manual Figure 3.7: Prompt message during import process. Important: The user can always copy and paste with the traditional windows style method. Another option for importing data as an Excel worksheet is to use Excel’s embedded import tool. To use this option, click on the Add Template worksheet, choose from Excel’s Data ribbon: Data Get External Data, select columns and rows to import, select from data type in Columns (General), and choose to import in the B2 Cell of the current worksheet (when necessary use also: Data Text to column). 3.2.1.3. Step 3: check codification Once data has been inserted in the template, the next step is to check the data against codifications defined under the data call. To check codifications click on the red button. The user can choose to perform this check on the selected worksheet or on all the worksheets in the workbook by clicking the corresponding radio button on the right panel of the ‘Check Codification’ button. If ‘all worksheets’ is selected the macro will go through all worksheets in the template file, one by one. If the ‘current worksheet’ option is selected, the macro will check only the active worksheet. Figure 3.8: Check Codification process executed on the selected (active) worksheet. Different type of checks are executed during this process: 1. Validity of the worksheet name. 2. Validity of the headings (same order, same spelling and same quantity as the ones defined in the templates) 3. Validity of the data below each headings depending on their definition: a. checks on the type of value (numeric, alphanumeric, double…), b. checks on numeric range, c. checks on maximum number of characters, d. checks if empty cells are allowed, e. checks of code is found in predefined lists. Only after successfully passing checks 1 and 2, does the procedure of checking if the data conforms to the codification scheme described in the data call begins (point 3). 9 DV Tool 4.14 User Manual During the check, the only enabled button is ‘Suspend’. Clicking this button while running the checks, stops the process and only the errors, if any, that have been detected until that moment are shown. There are two colour definition for indicating warnings or errors in the record fields: = Cells with red colour contain an invalid entry for mandatory fields. The user must correct them otherwise they will fail the ‘Upload facility’ controls. = Cells with orange colour contain an invalid entry for an optional field. The users are strongly advised to correct them. After finishing this process, or after clicking ‘Suspend’, the user needs to click a second time on the button to get back to the original situation with all the other buttons again enabled. Clicking on a column heading and the icon allows the user to open a separate excel sheet with the valid codes, if any, belonging to the selected heading (see 3.2.2 Support area, page 14). The user should also consult the data call specification and – for the effort data call – the ‘Actions and associated error messages for data errors’ tables in the upload facility manual. Important: After correcting the errors, always re-evaluate to ensure that the changes are correct. 3.2.1.4. Step 4: check duplication Click the ‘Check Duplication’ button in the control panel to start the check of duplicated excel rows. The duplication is based on predefined columns that need to be different to give sense to the data. Those set of columns can differ from template to template. The columns considered in the duplication process are the ones with the coloured heading (note: the coloured headings are visible only if step1 has been executed). Figure 3.9: Duplication check panel. The check can be executed for the selected worksheet or for all the worksheets if they are valid templates. This choice must be done from the side panel before pressing the ‘Check Duplication’ button. If no choice is made then only the selected worksheet is checked. A progressive report can be viewed during the execution. The results are reported in an ad-hoc column created for this purpose at the beginning of the sheet: A1. In this column, all duplicated rows are coloured in orange and the number of the current row is displayed coupled with the number of the row with which it matches. 10 DV Tool 4.14 User Manual Figure 3.10: Duplication check example: row 5 is a duplication of row 4 and row 12 is equal to row 11. Action: the user can choose to ignore the warnings or can delete all duplicated rows. To examine the duplicated rows use the support button at the top (this avoids having to scroll down the sheet to find duplicates). The ‘matched’ rows retain their original row number in the ad-hoc column. A message box will pop-up for confirmation. Important: Always repeat the checking procedure after correcting the errors. If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may be refused. 3.2.1.5. Step 5: check correspondences This check involves more than one column on the same row, this set of columns is defined by the data calls. For a specific value in a column only a few possible values can be inserted in certain other columns. It is also named as the ‘horizontal check’ because it is done comparing cells row by row. Figure 3.11: Correspondence check panel. Figure 3.12: Correspondence check example (Effort data call). This check can be done on the selected worksheet or on all valid worksheets. This choice needs to be made before pressing the ‘check correspondence’ button. 11 DV Tool 4.14 User Manual During the process the cells that are in the correspondence column set are coloured in yellow when not correct and a message is written on the ‘check result’ column (A1). This process may take several minutes if the file contains many rows. To correct the wrong correspondences the user needs to change one of the values involved by clicking on the cell. By clicking on the heading of column ‘A’ followed by the icon it is possible to see the possible correspondences. The user should also consult the data call specification and – for the effort data call – the ‘Actions and associated error messages for data errors’ tables in the upload facility manual. Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may be refused. 3.2.1.6. Step 6: export data An ‘Export Data’ option is available on the control panel. This option allows the data present in the DV tool to be saved into new excel files ready to be uploaded into the ‘Upload facility’ of the Data Collection web site. The new excel files are without the embedded macros and without the ‘check results’ columns. When a worksheet contains a large number of rows, the user is prompted to split the data into smaller size files. This can be done by number of rows or by years. This choice produces more than one file per worksheet exported. Figure 3.13: Option to split the file to smaller size files. Figure 3.14: Choose to split the data by year. After completing the operation, a message appears (Figure 3.15) informing the users about the folder where the files are saved. By default, the application creates a new folder in the folder of the DV tool file, with the name of the template and the date/time. 12 DV Tool 4.14 User Manual Figure 3.15: Successful export message. If the worksheets are split in smaller files, the files are enumerated with the name of the worksheet. In case the user chooses to split the worksheets by year then the excel files produced are named also with the year number. An example is given in Figure 3.16. Figure 3.16: Example of exported data in the created directory (export by year). Important: Before exporting the data set always delete any worksheet that you do not intend to keep. The ‘check result’ column and the macros are automatically removed from the worksheet exported but any coloured cells are kept if not removed by the user. 13 DV Tool 4.14 User Manual 3.2.2. Support area The support area is a set of icons that the user can find in the upper right corner of the control panel. Definition for all of them follow: If pressed it erases all error and warning messages that have been produced after the checking. The effect is only on the selected worksheet. It pushes the error messages to the top of the worksheet to avoid scrolling down each time the user wants to see the affected rows. Turn on and off the ‘Check result’ column situated as the first column on every worksheet. This column is automatically removed during the export process producing an external file that is suitable for the upload of data. By clicking this icon you will retrieve some of the code definitions, the list of possible valid codes to be inserted. This icon is a link to the Upload page of the Data Collection web site. This page lists all the data calls, the periods of the data calls and a link to download the manual of the ‘Upload facility’. This icon is a link to the open calls web page of the Data Collection web site. Here you can find information like: official letter, template explanations, links to relevant legislation and notes that can occur during the data call. 4. Important Notes This manual is for the current version of the DV Tool 4.14. The users are strongly advised to download and install the latest version of Microsoft’s Office Service Pack in case they face any compatibility problems, which can be found here. The DV Tool has been tested successfully on Windows XP, Windows 7 and Windows 8 operating systems. When dealing with a large number of records please be patient while checking or exporting data. When the Office 2003 versions are used, the total number of rows is limited to 65535. However, the Office 2007 version can handle over 1 million rows of data. For questions, suggestions, problems, bug reporting and support, send a direct email to the following email address: [email protected] 14 DV Tool 4.14 User Manual 5. Disclaimer of warranty THE APPLICATION AND MANUAL ARE PROVIDED ON AN "AS IS'' BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE CODE IS WITH YOU. SHOULD ANY CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. NO USE OF ANY CODE, APPLICATION OR MANUAL IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. 15 European Commission EUR XXXXX LL – Joint Research Centre – Institute for the Protection and Security of the Citizen Title: JRC Data Validation Tool for the Fleet Economic Scientific Data Call through the EU Data Collection Regulation Framework - DV Tool 3.0 - User Manual Author: JRC G.04 FISHREG Data Collection Team Luxembourg: Publications Office of the European Union 2011 –21 pp. – 21 x 29.7 cm EUR – Scientific and Technical Research series – ISSN XXXX-XXXX ISBN X-XXXX-XXXX-X doi:XXXXX Abstract The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and embedded in specifically designed template Excel Workbooks. The main purpose of this tool is to facilitate and support the Member States in uploading data which meet the requirements defined by DG MARE in the official DCF data calls. The use of these Excel Template files is not mandatory. However, the data validations checks performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the uploading procedure. The current version of the DV tool is 4.14. The purpose of this user manual is to provide the concept of this tool and guidelines for its use. How to obtain EU publications Our priced publications are available from EU Bookshop (http://bookshop.europa.eu), where you can place an order with the sales agent of your choice. The Publications Office has a worldwide network of sales agents. You can obtain their contact details by sending a fax to (352) 29 29-42758. AA-BB-XXXXX-LL-C The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies. As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union. Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national.