JRC Data Validation tool for Data Calls collected under

Transcription

JRC Data Validation tool for Data Calls collected under
JRC Data Validation tool
for Data Calls collected under
the EU Data Collection Framework
DV Tool 4.14
User Manual
JRC G.04 Data Collection Team
EUR xxxxx EN - 2013
The mission of the JRC-IPSC is to provide research results and to support EU policy-makers in
their effort towards global security and towards protection of European citizens from accidents,
deliberate attacks, fraud and illegal actions against EU policies.
European Commission
Joint Research Centre
Institute for the Protection and Security of the Citizen
Contact information
Address: Via Fermi 2749, T.P. 051 – Ispra (VA), 21027 - Italy
E-mail: [email protected]
Tel.: 0332 786479
Fax: 0332 789658
http://ipsc.jrc.ec.europa.eu/
http://www.jrc.ec.europa.eu/
Legal Notice
Neither the European Commission nor any person acting on behalf of the Commission
is responsible for the use which might be made of this publication.
Europe Direct is a service to help you find answers
to your questions about the European Union
Freephone number (*):
00 800 6 7 8 9 10 11
(*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed.
A great deal of additional information on the European Union is available on the Internet.
It can be accessed through the Europa server http://europa.eu/
JRC [PUBSY request]
EUR XXXXX LL
ISBN X-XXXX-XXXX-X
ISSN XXXX-XXXX
doi:XXXXX
Luxembourg: Publications Office of the European Union
© European Union, 2013
Reproduction is authorised provided the source is acknowledged
Printed in Italy
DV Tool 4.14 User Manual
TABLE OF CONTENTS
1.
Introduction ............................................................................................................................. 3
2.
DV tool and Microsoft Office Excel versions ............................................................................. 4
3.
How to use the DV tool features .............................................................................................. 4
3.1.
3.2.
AT OPENING ............................................................................................................................................... 4
DV TOOL CONTROL PANEL........................................................................................................................... 5
3.2.1.
Data Check area ......................................................................................................................... 6
3.2.1.1.
3.2.1.2.
3.2.1.3.
3.2.1.4.
3.2.1.5.
3.2.1.6.
3.2.2.
Step 1: add template ........................................................................................................................... 6
Step 2: paste data into the template..................................................................................................... 8
Step 3: check codification .................................................................................................................... 9
Step 4: check duplication ................................................................................................................... 10
Step 5: check correspondences ......................................................................................................... 11
Step 6: export data ............................................................................................................................ 12
Support area ............................................................................................................................. 14
4.
Important Notes .................................................................................................................... 14
5.
Disclaimer of warranty ........................................................................................................... 15
1
DV Tool 4.14 User Manual
LIST OF FIGURES
Figure 3.1: The DV tool button in Excel’s Home ribbon. ....................................................................................... 4
Figure 3.2: The DV tool control panel. .................................................................................................................. 5
Figure 3.3: Buttons of the ‘support area’. .............................................................................................................. 6
Figure 3.4: Selection template window for the Fishing Effort Regime data call ..................................................... 7
Figure 3.5: Selection template window for the Fleet Economic data call. ............................................................. 7
Figure 3.6: Import of data from a file with comma separator as delimiter. ............................................................. 8
Figure 3.7: Prompt message during import process. ............................................................................................ 9
Figure 3.8: Check Codification process executed on the selected (active) worksheet. ......................................... 9
Figure 3.10: Duplication check panel. ................................................................................................................. 10
Figure 3.11: Duplication check example: row 5 is a duplication of row 4 and row 12 is equal to row 11............. 11
Figure 3.12: Correspondence check panel. ........................................................................................................ 11
Figure 3.13: Correspondence check example (Effort data call). ......................................................................... 11
Figure 3.14: Option to split the file to smaller size files. ...................................................................................... 12
Figure 3.15: Choose to split the data by year...................................................................................................... 12
Figure 3.16: Successful export message. ........................................................................................................... 13
Figure 3.17: Example of exported data in the created directory (export by year). ............................................... 13
2
DV Tool 4.14 User Manual
1. Introduction
The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and
embedded in a specifically designed Excel Workbook. The main purpose of this tool is to facilitate and
support the Member States in uploading data to meet the requirements defined by DG MARE in official
DCF data calls by STECF (Council Regulation 199/2008). The use of this tool is not mandatory.
However, the data validations checks performed by the DV tool can greatly reduce the number of
erroneous records contained in a file to be uploaded to the DCF web site, and hence facilitate the
uploading procedure.
The tool is capable of checking national data stored in Excel rows against certain codifications and
rules as requested in the data call. Checks can be performed on partial or whole data sets to be
uploaded to the DCF web site. The majority of the checks concern the use of valid codes as defined in
the data call and the type of the data entered (numeric or text). Erroneous data are identified and can
easily be corrected.
The DV tool is design to check data that are provided in predefined worksheets depending on the
type of data call. Each data call has a dedicated DV tool.
The first version of the DV tool was developed to serve the needs of the Fishing Effort Regimes and
the Mediterranean & Black Sea data calls. The DV tool was a set of excel files one for each set of data
type requested by the data call. A second version was produced for the Fleet Economic data call in 2013.
It contained improvements based on experience acquired with the previous data calls:

a single excel file containing all the data templates and with

the possibility to automatically update, directly from the database, the codes used by the DV
tool for the validation of the data.
It was realised that ideally the JRC should develop a DV tool that could serve different data calls while
remaining easy to maintain. Therefore the original DV tool, developed by Nikolaos Mitrakis in 2011, has
been re-engineered and re-designed from scratch to allow the flexibility and maintainability desired.
However, the tool is still under development and any feedback from users that could help improve the
tool according to their needs is most welcomed. Any changes in the data call imply coherent adaptation
of the modules.
The purpose of this manual is to provide the general concept surrounding the tool and guidelines for
its use. Since the tool is integrated in Excel Workbooks, all Excel functions are still available. The tool is
available for download from the Data Collection Framework web site:
https://datacollection.jrc.ec.europa.eu
3
DV Tool 4.14 User Manual
2. DV tool and Microsoft Office Excel versions
To benefit from the DV tool functionalities it is necessary to run it on a Windows operating system
where Microsoft Excel software is installed. The DV tool can be used within Microsoft Excel 2003, 2007,
2010 and 2013; either 32bits or 64bits.
3. How to use the DV tool features
The DV tool stands for Data Validation tool. In this chapter are explained the functionalities and the
possible steps to be performed in order to create final excel data files that can be uploaded to the
‘Upload Facility’ available at the Data Collection web site.
3.1. At opening
When opening the DV tool file with Microsoft Office 2007/2010 (see below for Microsoft Office 2003),
a ‘Security Warning’ may appear with the message:
“Macros have been disabled. Options…”
This is a standard security option in all Microsoft’s applications, since macros can potentially have
access to your data. In order to continue with the use of the tool, the user should choose:
“OptionsEnable this content”.
To change the security settings go to “DeveloperCodeMacro SecurityDisable” all macros with
notification. However, we recommend that the default option not be changed (although the DV tool is a
trusted application, other macros can potentially cause several problems). Once the Macros are enabled
the DV Tool ribbon is visible in the Home tab of the Excel application.
Figure 3.1: The DV tool button in Excel’s Home ribbon.
In order to use the DV tool with MS Office 2003 version it is necessary to download and install (if not
already installed) “Microsoft’s Office Compatibility Pack for Word, Excel and PowerPoint File Formats”
which is available for free at: http://www.microsoft.com/en-us/download/details.aspx?id=3
Attention should be given to the restriction of a maximum number of 65535 rows (the first row is
always the header). However, this is the theoretical limit of rows in Excel 2003. In practise, and especially
4
DV Tool 4.14 User Manual
when dealing with large files, large number of rows can cause significant computational time or even an
Excel not responding status.
Users are strongly advised to use Excel 2007 when dealing with more than 20000 rows especially
when dealing with the catch data table (sheet name EFF_01_CATCH). One other option is first to
manually split the data set into smaller sizes (e.g. one sheet per year of data) and then insert or paste to
the DV tool. Apply the DV tool separately for each one of these smaller data sets.
The environment is the same except for the option to recall and show the control panel from the
Office ribbon since this is not available for pre-Office 2007 versions. In order to re-show the DV control
panel, simply use this key combination: Ctrl + J or just save and re-open the template.
To change the security settings for the Macros go to Tools  Macro  Security  Security Level 
Medium.
3.2. DV tool control panel
The DV tool control panel is the first and the main window that pops-up when opening the Workbook
(Figure 3.2). Here, users can access all the functions that the DV tool provides.
Figure 3.2: The DV tool control panel.
5
DV Tool 4.14 User Manual
The panel window consists of two main areas: (1) a ‘Data Checking’ area and (2) a ‘Support’ area.
The ‘Data Check’ area is made of step-buttons (see the big coloured buttons in Figure 3.2) that can
be performed on a single worksheet or on all worksheets at once. While the blue buttons act on the
worksheet as a whole (create, export, import…), the other buttons act on the data contained in the
worksheet(s) using the corresponding colour to identify the type of error-check.
The ‘Support’ area is made of smaller buttons visible at the top right corner of the control panel.
These buttons can be used to manipulate the results of the checks and to request help.
Figure 3.3: Buttons of the ‘support area’.
3.2.1.
Data Check area
Users are strongly advised to follow each step in order to guarantee the appropriate data checks. The
available buttons include:

Step 1
 Add a template structure

Step 2
 Import Data

Step 3
 Codification check = checks if worksheets conform to the data call template definition

Step 4
 Duplication check = check for duplicated rows, the columns involved in this process are
established by the template definition

Step 5
 Correspondence check = checks if cells of predefined columns on the same row contain valid
combination values

Step 6
 Export Data = exports the data in the workbook to files ready to be uploaded to the DCF web
site
Step1 and Step2 are not necessary if the user intends to import existing excel worksheets by the use
of excel functionalities.
3.2.1.1.
Step 1: add template
In order to upload the DCF data into the ‘Upload facility’ of the Data Collection web site the user
needs to prepare the data in a specific format. This format can be different from data call to data call
because of requests for a different set of data aggregated in different levels. The specific format is
defined by templates; these templates are excel worksheet with pre-defined headings.
By clicking on the ‘add worksheet template’ a pop-up window exposes the list of templates valid for
the specific data call. The user needs to click one or more check-boxes in correspondence to the desired
6
DV Tool 4.14 User Manual
template(s) in order to create one or more formatted worksheet(s). When pressed, the ‘Execute’ button
will close the window and add into the workbook as many worksheets as there were templates selected.
Figure 3.4: Selection template window for the Fishing Effort Regime data call
Figure 3.5: Selection template window for the Fleet Economic data call.
7
DV Tool 4.14 User Manual
Once the worksheets are created they can be filled with data; it is possible to copy and paste them
from an external source (excel sheet or text file) or continue with step 2.
Important: For big quantities of data (for example: ‘catch’ data for Fishing Effort Regime data call or
‘landings’ data for the Fleet Economic data call) it is recommended to open and work with only the one
template. For the other templates use the DV tool in another excel file.
3.2.1.2.
Step 2: paste data into the template
In order to populate the worksheets with data and check whether the data meet the requirements of
the data call the first thing to do is to fill up the empty cells with the data. The user has three options:

type the data one by one,

use the excel functionalities to import them from another source (i.e. excel worksheet, text
files),

press the ‘Import’ button on the control panel (step 2).
The ‘Import data’ feature offered by the DV tool has the advantage that it imports the data just below
the header row so that the user can benefit from the format definition and compare the headings with the
headings (if they exist) of the external source. Several imports can be executed on the same worksheet
building up a set of data coming from different sources. The possible external sources can be of two
types:

.csv files (comma separated values),

.txt files (text files).
First, select the worksheet template by clicking on any cell (note: cell format is set to ‘General’, i.e. no
specific number format). Second, select the character separator used by the external source to separate
the data. Third, press the ‘Import data’ button and navigate to select the external source. When you click
on the ‘Import data’ button, the default Dialog Window will pop up for selecting the file to import. After
selecting the file click the ‘OK’ button and the application will execute the importing procedure if the
separator is recognised among the external source data.
Figure 3.6: Import of data from a file with comma separator as delimiter.
A message window is shown to inform the user about the import process. Once the data is imported
please check carefully that the imported data correspond to the headings; otherwise the worksheet will
never pass the validation checks.
During this process the system may prompt the user with messages; in these cases the user needs to
confirm and continue with the process. One possible message could be the one represented in Figure
3.7 when working in Microsoft Excel 2013.
8
DV Tool 4.14 User Manual
Figure 3.7: Prompt message during import process.
Important: The user can always copy and paste with the traditional windows style method. Another
option for importing data as an Excel worksheet is to use Excel’s embedded import tool. To use this
option, click on the Add Template worksheet, choose from Excel’s Data ribbon: Data  Get External
Data, select columns and rows to import, select from data type in Columns (General), and choose to
import in the B2 Cell of the current worksheet (when necessary use also: Data  Text to column).
3.2.1.3.
Step 3: check codification
Once data has been inserted in the template, the next step is to check the data against codifications
defined under the data call.
To check codifications click on the red button. The user can choose to perform this check on the
selected worksheet or on all the worksheets in the workbook by clicking the corresponding radio button
on the right panel of the ‘Check Codification’ button. If ‘all worksheets’ is selected the macro will go
through all worksheets in the template file, one by one. If the ‘current worksheet’ option is selected, the
macro will check only the active worksheet.
Figure 3.8: Check Codification process executed on the selected (active) worksheet.
Different type of checks are executed during this process:
1. Validity of the worksheet name.
2. Validity of the headings (same order, same spelling and same quantity as the ones defined in the
templates)
3. Validity of the data below each headings depending on their definition:
a. checks on the type of value (numeric, alphanumeric, double…),
b. checks on numeric range,
c.
checks on maximum number of characters,
d. checks if empty cells are allowed,
e. checks of code is found in predefined lists.
Only after successfully passing checks 1 and 2, does the procedure of checking if the data conforms
to the codification scheme described in the data call begins (point 3).
9
DV Tool 4.14 User Manual
During the check, the only enabled button is ‘Suspend’. Clicking this button while running the checks,
stops the process and only the errors, if any, that have been detected until that moment are shown.
There are two colour definition for indicating warnings or errors in the record fields:
=
Cells with red colour contain an invalid entry for mandatory fields. The user must correct
them otherwise they will fail the ‘Upload facility’ controls.
=
Cells with orange colour contain an invalid entry for an optional field. The users are
strongly advised to correct them.
After finishing this process, or after clicking ‘Suspend’, the user needs to click a second time on the
button to get back to the original situation with all the other buttons again enabled.
Clicking on a column heading and the
icon allows the user to open a separate excel sheet with the
valid codes, if any, belonging to the selected heading (see 3.2.2 Support area, page 14). The user should
also consult the data call specification and – for the effort data call – the ‘Actions and associated error
messages for data errors’ tables in the upload facility manual.
Important: After correcting the errors, always re-evaluate to ensure that the changes are correct.
3.2.1.4.
Step 4: check duplication
Click the ‘Check Duplication’ button in the control panel to start the check of duplicated excel rows.
The duplication is based on predefined columns that need to be different to give sense to the data. Those
set of columns can differ from template to template. The columns considered in the duplication process
are the ones with the coloured heading (note: the coloured headings are visible only if step1 has been
executed).
Figure 3.9: Duplication check panel.
The check can be executed for the selected worksheet or for all the worksheets if they are valid
templates. This choice must be done from the side panel before pressing the ‘Check Duplication’ button.
If no choice is made then only the selected worksheet is checked.
A progressive report can be viewed during the execution. The results are reported in an ad-hoc
column created for this purpose at the beginning of the sheet: A1. In this column, all duplicated rows are
coloured in orange and the number of the current row is displayed coupled with the number of the row
with which it matches.
10
DV Tool 4.14 User Manual
Figure 3.10: Duplication check example: row 5 is a duplication of row 4 and row 12 is equal to row 11.
Action: the user can choose to ignore the warnings or can delete all duplicated rows. To examine the
duplicated rows use the support button at the top
(this avoids having to scroll down the sheet to find
duplicates). The ‘matched’ rows retain their original row number in the ad-hoc column. A message box
will pop-up for confirmation.
Important: Always repeat the checking procedure after correcting the errors. If not corrected these errors
will produce errors in the ‘Upload facility’ and the worksheet may be refused.
3.2.1.5.
Step 5: check correspondences
This check involves more than one column on the same row, this set of columns is defined by the
data calls. For a specific value in a column only a few possible values can be inserted in certain other
columns. It is also named as the ‘horizontal check’ because it is done comparing cells row by row.
Figure 3.11: Correspondence check panel.
Figure 3.12: Correspondence check example (Effort data call).
This check can be done on the selected worksheet or on all valid worksheets. This choice needs to
be made before pressing the ‘check correspondence’ button.
11
DV Tool 4.14 User Manual
During the process the cells that are in the correspondence column set are coloured in yellow when
not correct and a message is written on the ‘check result’ column (A1). This process may take several
minutes if the file contains many rows.
To correct the wrong correspondences the user needs to change one of the values involved by
clicking on the cell. By clicking on the heading of column ‘A’ followed by the
icon it is possible to see
the possible correspondences. The user should also consult the data call specification and – for the effort
data call – the ‘Actions and associated error messages for data errors’ tables in the upload facility
manual.
Important: If not corrected these errors will produce errors in the ‘Upload facility’ and the worksheet may
be refused.
3.2.1.6.
Step 6: export data
An ‘Export Data’ option is available on the control panel. This option allows the data present in the DV
tool to be saved into new excel files ready to be uploaded into the ‘Upload facility’ of the Data Collection
web site. The new excel files are without the embedded macros and without the ‘check results’ columns.
When a worksheet contains a large number of rows, the user is prompted to split the data into smaller
size files. This can be done by number of rows or by years. This choice produces more than one file per
worksheet exported.
Figure 3.13: Option to split the file to smaller size files.
Figure 3.14: Choose to split the data by year.
After completing the operation, a message appears (Figure 3.15) informing the users about the folder
where the files are saved. By default, the application creates a new folder in the folder of the DV tool file,
with the name of the template and the date/time.
12
DV Tool 4.14 User Manual
Figure 3.15: Successful export message.
If the worksheets are split in smaller files, the files are enumerated with the name of the worksheet. In
case the user chooses to split the worksheets by year then the excel files produced are named also with
the year number. An example is given in Figure 3.16.
Figure 3.16: Example of exported data in the created directory (export by year).
Important: Before exporting the data set always delete any worksheet that you do not intend to keep.
The ‘check result’ column and the macros are automatically removed from the worksheet exported but
any coloured cells are kept if not removed by the user.
13
DV Tool 4.14 User Manual
3.2.2.
Support area
The support area is a set of icons that the user can find in the upper right corner of the control panel.
Definition for all of them follow:
If pressed it erases all error and warning messages that have been produced after the
checking. The effect is only on the selected worksheet.
It pushes the error messages to the top of the worksheet to avoid scrolling down each
time the user wants to see the affected rows.
Turn on and off the ‘Check result’ column situated as the first column on every worksheet.
This column is automatically removed during the export process producing an external
file that is suitable for the upload of data.
By clicking this icon you will retrieve some of the code definitions, the list of possible valid
codes to be inserted.
This icon is a link to the Upload page of the Data Collection web site. This page lists all
the data calls, the periods of the data calls and a link to download the manual of the
‘Upload facility’.
This icon is a link to the open calls web page of the Data Collection web site. Here you
can find information like: official letter, template explanations, links to relevant legislation
and notes that can occur during the data call.
4. Important Notes

This manual is for the current version of the DV Tool 4.14.

The users are strongly advised to download and install the latest version of Microsoft’s Office Service
Pack in case they face any compatibility problems, which can be found here.

The DV Tool has been tested successfully on Windows XP, Windows 7 and Windows 8 operating
systems.

When dealing with a large number of records please be patient while checking or exporting data.

When the Office 2003 versions are used, the total number of rows is limited to 65535. However, the
Office 2007 version can handle over 1 million rows of data.

For questions, suggestions, problems, bug reporting and support, send a direct email to the following
email address: [email protected]
14
DV Tool 4.14 User Manual
5. Disclaimer of warranty
THE APPLICATION AND MANUAL ARE PROVIDED ON AN "AS IS'' BASIS, WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT
LIMITATION, WARRANTIES THAT THE CODE IS FREE OF DEFECTS, MERCHANTABLE,
FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE
QUALITY AND PERFORMANCE OF THE CODE IS WITH YOU. SHOULD ANY CODE
PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY
OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR
OR CORRECTION. NO USE OF ANY CODE, APPLICATION OR MANUAL IS AUTHORIZED
HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
15
European Commission
EUR XXXXX LL – Joint Research Centre – Institute for the Protection and Security of the Citizen
Title: JRC Data Validation Tool for the Fleet Economic Scientific Data Call through the EU Data Collection
Regulation Framework - DV Tool 3.0 - User Manual
Author: JRC G.04 FISHREG Data Collection Team
Luxembourg: Publications Office of the European Union
2011 –21 pp. – 21 x 29.7 cm
EUR – Scientific and Technical Research series – ISSN XXXX-XXXX
ISBN X-XXXX-XXXX-X
doi:XXXXX
Abstract
The Data Validation (DV) tool is a set of macros developed in Visual Basic for Applications (VBA) and
embedded in specifically designed template Excel Workbooks. The main purpose of this tool is to facilitate and
support the Member States in uploading data which meet the requirements defined by DG MARE in the official
DCF data calls. The use of these Excel Template files is not mandatory. However, the data validations checks
performed by the DV tool can greatly reduce the number of erroneous records contained in a file to be uploaded
to the DCF web site, and hence facilitate the uploading procedure. The current version of the DV tool is 4.14.
The purpose of this user manual is to provide the concept of this tool and guidelines for its use.
How to obtain EU publications
Our priced publications are available from EU Bookshop (http://bookshop.europa.eu), where you can place
an order with the sales agent of your choice.
The Publications Office has a worldwide network of sales agents. You can obtain their contact details by
sending a fax to (352) 29 29-42758.
AA-BB-XXXXX-LL-C
The mission of the JRC is to provide customer-driven scientific and technical support for the
conception, development, implementation and monitoring of EU policies. As a service of the
European Commission, the JRC functions as a reference centre of science and technology
for the Union. Close to the policy-making process, it serves the common interest of the
Member States, while being independent of special interests, whether private or national.