Company Name Cleansing And Normalization

Transcription

openprise!
Company Name
Cleaning & Normalization
openprise!
Cook Book Series
Recipe Overview
This is a recipe for cleaning and normalizing company name data •  Clean and reformat company names for readability •  Create company-‐alias master list •  Normalize company name data using master list You will need the following raw data: •  Company name 2
openprise!
• 
Add a rule by clicking on an exis@ng rule and +. • 
Put new data into a new data aDribute so you can easily compare before vs. aFer and confirm the rule is doing what it is supposed to do. • 
• 
• 
Some configura@ons are found by clicking on: • 
3
Can’t see the open reference data? Check the seKng in your Data Catalog: The company-‐alias master list is generated using a machine algorithm. It is very accurate but never perfect. It is highly recommended that you review and tweak the master list before using it to normalize company names. Experiment with the fuzzy matching algorithm parameters to get the best results. openprise!
Pipeline 1, Rule 1: Clean Company Name
TIP: Options to remove or expand words like
Inc, Corp, Ltd. We highly recommend removal.
Easy to read and generates better master list.
Reference Data
4
openprise!
Cleaned Company Names With
Inc, Corp, Ltd Removed
5
Cleaned Company Names With
Inc, Corp, Ltd Expanded
openprise!
Pipeline 1, Rule 2: Build Master List
Make sure to use the cleaned company
name, not the original company name.
Start with these default values. See tuning tips on the next page.
6
openprise!
Company-Alias Master List Generated
7
openprise!
• 
The higher the fuzziness index, the more closely the names have to match to be grouped together. For example: • 
• 
• 
The leading index dictates what % of leading text must match for the names to be grouped together. For example: • 
• 
• 
“Department of Motor Vehicles Arizona” and “Department of Motor Vehicles Alabama” will match on an index of 70% “DMV Arizona and DMV Alabama” will not match on an index of 70% Short names can create many false groupings. Increase minimum character index to reduce matching on short names. For example: • 
8
“UBS Financial” and “ABC Financial” will match on high index ~ 0.8 “UBS Financial” and “UBC Finland” will match on lower index ~ 0.3 CSC vs. USC, or NBC vs. NBA openprise!
Pipeline 2, Rule 1: Normalize Co. Name
Reference is the
Master List produced
by Pipeline 1 Rule 2
Normalize the
cleaned company
names produced by
Pipeline 1 Rule 1
9
openprise!
Company Names Cleaned Then Normalized
10
openprise!
Recipe Review
Recommenda@ons • 
For marke@ng systems, consider reducing the master list down to only customers and target accounts. It greatly reduces maintenance efforts. Want to do more? Try the following on your own: •  In addi@on to normalizing company name, add parent company data to the master list and append pipeline and sales data with parent company informa@on. This enables aggregated repor@ng and account based marke@ng. 11
openprise!
openprise
!
Data Automa@on For Business Users openprise!
Rules
Sharing
Analytics
[email protected] TwiDer: @openprisetech www.openprisetech.com 12

Company Name Cleansing And Normalization

Transcription

Similar documents

Empalme Gas Pipeline Branch

Hermosillo Gas Pipeline Branch

Samalayuca – Sasabe Gas Pipeline

S. Ellis - Community Advisory Boards

Colombia – Escobedo Gas Pipeline Geographic Location

Wampee Conference Center

flyer - Centennial Classic Arms

pETROSANTANdER (uSA) INC.

Microdermabrasion

Ojinaga – El Encino Gas Pipeline

Managing Risks in the Project Pipeline Minimizing the Impacts Larry Redd, P.E.

GE and Accenture: Intelligent Pipeline Solution

Fasken Oil and Ranch - Pipeline Safety Info

Newnan Bypass Fact Sheet

Diane Korte Remarks â FERC Scoping Meeting â Mar. 19, 2015

Artificial Pancreas Medical Devices Pipeline Assessment - Analysis, Technologies & Forecasts Report - Key Vendors: Admetsys Corp, BioTex, Giner

C-2 June 20, 2009 - Peace River Regional District

the AutoTracker brochure

1 oleum and sulfuric acid pipeline safety maintenance presented by

PipeSak Brochure

At a Glance - PennTex Midstream Partners

B2B Sales Pipeline Management INNOVATEGY