Match Job Advanced Settings

Prev Next

Overview

Advanced settings can be configured on a Match job after the initial save of the job. To access advanced settings, navigate to the Edit Match job page and click on the Advanced settings button in the Standard Settings section.

Matching Score

Match uses scoring to control what is presented as a potential match, reducing false positive matches and increasing correctly matched data. During the run of the Match job, records with the same match keys are grouped together and then compared with each other. Refer to Match Keys below for more details on match keys.

Match uses the scoring weights for specific fields to determine an overall match score. If the score reaches the defined threshold score for a given match level, then a matching pair is written to the results along with the score achieved.

The following score details must be set for a Match job:

  • Minimum Score—sets the minimum score for records to be recognized as a potential match. For example, if the minimum score setting is 80, then the combined component scores must be greater than 80 to be considered as a match.

  • Max Cluster Size—sets the maximum size that a cluster can be before being reported as a large cluster. Clusters are the groupings of records created by the initial matching done on the match keys.

  • Output % Score—when toggled to the On position, the overall score is displayed on the Match job record results to show how similar records are.

Match Keys

Match keys are defined fields in each record used to compare and group potentially matching records. Refer to Match Keys Configuration for details on available match keys.

To add a Match key:

  1. Click the Add key button.

  2. On the Add Key dialog box, select the Type. Choose between Fuzzy and Exact to determine whether the match needs to be an exact match (Exact) or a similar match (Fuzzy).

  3. Choose a field type from the Fields list box.

  4. Click the Add button. The Field type is added to the grid.

  5. On the field name added to the grid, select the Function. Refer to Match Key Functions for more details.

  6. Enter the desired Start and Length for the key.

  7. Toggle the Optional setting to the On position if the function for the key is optional for matching results. Otherwise, leave this setting in the Off position.

  8. Click the Save button to save the match key(s).

Note

More than one key can be added on the Add Key dialog box before saving. Repeat steps 2 - 7 above to add multiple keys before saving.

Once match keys are saved, the selections display in the Match Keys and Match Key Fields grid.

Match Keys Grid

The Match keys selected in the section above display here to indicate Type (Exact or Fuzzy) and Key Name. The following actions can be taken from this grid:

  • Skip Fuzzy—if toggled to the On position, only one representative record from each exact key cluster (the first record added) is considered for fuzzy clustering.

  • Move Match Key—allows for reordering Match keys to run in the desired sequence

  • Edit Match Key—displays the Match Keys dialog box, allowing you to edit the selected Match Key and any other Match key for the Match job.

  • Delete Match Key—removes the Match key from the list. This also removes the corresponding data in the Match Key Fields grid below.

Match Key Fields

The Match Key Fields selected in the previous section display here to indicate the Field Name, Function, Start, and Length. The following actions can be taken from this grid:

  • Optional—if toggled to the On position, renders all non-optional keys within the Match Key required (i.e., records containing any blank values are not clustered and compared)

  • Move Field—allows for reordering Match Key Fields to run in the desired sequence

  • Delete Field—removes the Match Key Field from the list. This also removes the corresponding data in the Match Keys grid above.

Matching Rules

Matching rules consist of constraints and weights for each matching level that are used when compared records are scored.

Constraints

Choose one or more of the following constraints:

  • Must match gender—When selected, potential matches are disregarded if their genders differ. If the gender is unknown in one or both of the records, the records will potentially be classified as a match.

  • Must match suffix—When selected, potential matches will be disregarded if their suffixes differ. If the suffix is unknown in one or both of the records, the records will potentially be classified as a match.

  • Must match joint names—When selected, potential matches will be disregarded if one record has a joint name and the other does not. For example, normal behavior matches “Mr. and Mrs. J Smith” with “Mr. J Smith”; selecting this option prevents such matches.

Weights

A total score is calculated when two records are compared; this is the sum of the scores generated for each component within the two records, i.e., name, organization, address, postcode, etc.

Components are scored using the following weights:

  • Sure—identical or equivalent components

  • Likely—very similar components

  • Possible—less similar components

  • One Empty—one record contains no data for the component

  • Both Empty—both components contain no data for the component

Scoring thresholds can be applied to each field to provide further matching requirements when two records are compared. The total score begins at 0. As the score for each component is added to the total score, the total score must match or exceed the component’s specified threshold. If the total score is lower than a specified threshold, then the two records are rejected as a match and are scored as a 0.

For example, when looking for duplicates in data containing individuals’ names, addresses, and postcodes, the initial total score is 0. The first component (name) is added to the total. The total score must then exceed the name threshold. Using a threshold of 25 ensures that the names within all matching pairs score at least a Possible weight. The next component (organization) is then added to the total score, and the threshold is checked; a value of 0 (the default) effectively disables the threshold. The next component (address) is then added to the score total; a value of 55 requires that the address matches or exceeds a score of 30 because it has already been established that the name scores 25 or more. Assuming the total score is 55 or more, the next component is checked. Each subsequent component’s score is added to the total score and its threshold is checked each time.

Matching Matrices

The matching matrix allows you to configure the confidence that each type of name match achieves using the following combinations:

  • Name—LastName, FirstName, MiddleName

  • Company Name—Org Name1, Org Name2, Org Name3

The confidence level maps to the Match weights in the advanced settings, which ultimately determine the name score for that combination. For example, the following setting ensures identical names will be assigned the score associated with a SURE match as determined in the Weights configuration for the Match job:

[Equal LastName + Equal FirstName + Equal MiddleName = SURE]

Post-matching Rules

Post-matching rules are applied to fuzzy matching pairs before grouping. Each rule specifies both a condition using a SQL-like syntax, along with an action that determines what happens when a condition is satisfied.

To add a Post-matching rule:

  1. Click the Add Post-matching rule button.

  2. On the Add Post-matching Rule dialog box, enter a Condition for the Post-matching rule.

  3. Select the appropriate action for the rule from the Action list box. Choose from:

    1. Delete—deletes the attribute once the condition is met

    2. Keep—retains the attribute once the condition is met

    3. Review—marks the attribute for review once the condition is met

  4. Click the Save button.

The post-matching rules are added to the grid and can be edited or deleted using the Edit post-matching rule and Delete post-matching rule icons.

Generate Settings

The following settings are used when records are parsed and keys are generated:

  • Drop Excluded Words—When enabled, exclusion words such as “deceased, “addressee”, and “gone away” are not included in key fields, prohibiting their impact on record comparison.

  • Consider Casing—When enabled, casing of the incoming data is considered when splitting the data up for key extraction, proper casing, etc.

Generate Name

  • Use Equivalent Name—When enabled, the input first name is replaced with its equivalent from word lookup tables.

    Example

    “Tony Smith” and “Anthony Smith” are recognized as a match.

  • Process Blank Last Name—When enabled, forenames are used in place of a blank surname in key generation.

    Example

    A first name was entered, but a last name was not. It is assumed the first name is the last name, and match keys are generated rather than being left blank.

  • Non-hyphenated Double-barreled—When enabled, unrecognized middle names are considered part of a non-hyphenated, double-barreled last name.

    Example

    When the full name is “John Harrington Jones”, the last name is considered “Harrington-Jones” because “Harrington” is not a recognized first name.

  • Detect Inverse Name—When enabled, Match attempts to identify addressee names that have been specified with the last name preceding the first names, provided a comma delimiter follows the last name. Without a comma, a name is assumed to be in standard left-to-right format, with the first name preceding the last name.

    Example

    “Smith, John,” where Smith is the last name

Generate Company

  • Use Equivalent Name—When enabled, the equivalent of words indicating a business name (e.g., “Motors” or “Services”) are included in the normalized organization name and the corresponding phonetic keys. These equivalents are derived from the word lookup tables.

    Example

    “Wood Green Cars” matches with “Wood Green Motors” because the word “cars” has an equivalent of “motors”. However, neither of these matches with “Wood Green Carpets” because there is no equivalent between “carpets” and “cars” or “motors”.

  • Normalization Truncation—When set to a non-zero number and the organization consists of more than three words, the third element of the normalized organization name will be truncated to the first N (where N is the value of this setting) characters of each word after the first two words.

  • Ignore Parentheses—When enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys. This is useful for records such as “Remnel Ltd” and “Remnel (UK) Ltd” to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys.

  • Ignore Trailing Town—When enabled, trailing post towns are excluded from the phonetic organization keys.

    Example

    The phonetic organization keys for “Handso Ltd” and “Handso Essex Ltd” will be the same to help ensure such records are compared.

  • Legal Normalization—When enabled, the business type words (e.g., “Ltd” or “Inc”) and business words (e.g., “Motors” or “Services”) are included in the normalized organization name and the corresponding phonetic key.

    Note

    The Legal Normalization setting works in conjunction with the company Use Equivalent Name setting. If the Use Equivalent Name setting is disabled, the Legal Normalization setting is ignored.

Compare Settings

The following settings are used when records are compared:

Compare Name

  • Fuzzy Match Initials—This setting controls how similar-sounding initials (e.g., M/N, S/F, G/J) can be matched.

    • Full—When selected, one name’s initial is permitted to match the first letter of the other name’s first name. For example, “M Smith” vs. “Neil Smith”. This is the default setting.

    • Initials Only—When selected, only initials are permitted. For example, “M Smith” vs. “N Smith”.

    • No Match—When selected, such matches as listed in the previous settings’ examples are disabled.

  • Initial Match Forename—This setting controls the result achieved when an initial matches the first letter of a first name.

    • Equal—When selected, an initial matching the first letter of a first name achieves the same result as an exact match of two full first names.

      Example

      “B Smith” vs. “Bob Smith” achieves the same result as “Bob Smith” vs. “Bob Smith”.

    • Approx—When selected, the resultant name score is reduced to approximate matches.

    • Contains—When selected, the resultant name score is reduced to matches that contain the same string of characters.

  • Fuzzy Match Forename—This setting prevents different recognized first names from matching.

    Example

    The first names “Ron” and “Roy”, which are typically considered a fuzzy match, will not be matched because they are both recognized as individual first names.

    • Full Fuzzy Matching—When selected, first names are fuzzy matched regardless of whether one or both is recognized. This is the default setting.

    • No Fuzzy Matching—When selected, no fuzzy matching takes place.

    • Both Unrecognized—When selected, only the first names where both are unrecognized (e.g., “Rov” and “Row”) are fuzzy matched.

    • Either Unrecognized—When selected, first names when either isn’t recognized (e.g., “Ron” and “Rov”) are fuzzy matched.

  • Blank Name Company Matching—When enabled and two records contain no addressee names, this setting allows the names to achieve a score depending on what is available in the job title and company name fields.

    Example

    When two records contain job titles of “Managing Director” and a company name of “Syniti”, a positive name score is given even though the records don’t contain an addressee.

    • Both Names are Blank—This is the default setting. A score is achieved only when both records contain no addressee names.

    • Either Name Blank—When selected and one record contains no addressee name, a score is achieved.

    • Disabled—When disabled, the Blank Name Company Matching setting is ignored.

  • Initial Match Equivalent—This setting controls how an initial matches with an equivalent name.

    • Equal—This is the default setting. An initial matches a first name when that initial differs from the first letter of the first name if the initial represents the first letter of an equivalent first name.

      Example

      When comparing “Rebecca Smith” and “B Smith”, the initial “B” could be considered a match for “Becky”, which is a common abbreviation (or equivalent) of “Rebecca”.

    • Approx—Selecting this option reduces the resulting name scores so that such matches are more easily distinguished.

    • Unequal—Selecting this option reduces the resulting name scores so that such matches are more easily distinguished.

  • Prevent Mrs Matching Miss—When enabled, two compared names will not match if one has a title of “Mrs.” and the other has a title of “Miss”.

    Example

    “Mrs. J Smith” will not match with “Miss J Smith” when this setting is enabled.

  • Fuzzy Match Non-normalized Names—When enabled, additional matching checks are performed on names using the non-normalized name matching fields. This is useful when the generate setting Use Equivalent Names is enabled, which allows names such as “Elizabeth” and “Lisa” to match, but will not allow for some misspellings and typos such as “Lsia” to match.

  • Cross Match Initial to Name—When enabled, names are considered a match when the first letter of a first name matches the middle initial.

    Example

    “Richard Smith” and “John R Smith” are considered a possible match.

Compare Address

  • Loose Fuzzy Premise Match—When enabled, additional fuzzy premise matching is performed. Two premises match if:

    • they differ numerically by up to two numbers.

      Example

      When enabled, “1719” matches both “1720” and “1721”.

    • if one premise starts with the other, but contains extra trailing characters.

      Example

      When enabled, “88” and “88/2” match, but “88” and “887” do not match.

  • Strict Premise Match—When enabled, stricter address matching occurs. This is recommended for address-level matching, but it can be used to force stricter address matching at other levels.

Grouping Options

Bridging Prevention

Bridging happens when two matching pairs are combined by a common record to form a group where the other two records are not a good match. For example, consider the following two matching pairs:

Pair 1

  • AR017900|MISS|SUSAN|DAVIS|

  • AY272090|MS| |DAVIS|

Pair 2

  • AK557942|MRS|KAREN|DAVIS|

  • AY272090|MS| |DAVIS|

Each pair is a good match in itself. However, when combined into a group based on the common record AY272090, a bridge is created between the two non-matching records as shown here:

Bridged Group

  • AR017900|MISS|SUSAN|DAVIS|

  • AK557942|MRS|KAREN|DAVIS|

  • AY272090|MS| |DAVIS|

Several situations can cause bridging. The common factor is that some information is missing from a record. Causes of bridging include:

  • Name bridging—A record is missing the forename and matches multiple records with different forenames (as in the previous example)

  • Prefix bridging—A record with “Ms” matches to a record with “Miss” and to a record with “Mrs” (where one of the forenames is empty or just an initial)

  • Company bridging—A record with a company name acronym or partial company name matches multiple company names

Bridging prevention splits bridged groups into sub-groups. Bridging prevention will not break apart matches flagged as Keep by a post-matching rule as outlined in the previous section.

Select any of the following to manage and prevent various types of bridging:

  • Name Bridging Prevention—prevents bridging caused by missing forenames

  • Prefix Bridging Prevention—prevents bridging caused by “Ms” only if Name Bridging Prevention is also enabled

  • Company Bridging Prevention—prevents bridging caused by company name acronyms

  • Aggressive Splitting—disassociates bridging records from all matching records. If left disabled, bridging records will remain matched to one sub-group of non-bridging records.

Note

When enabling bridging prevention, it is important to disable the Skip Fuzzy option on any exact keys in the Match Keys section. Bridging prevention must know about all matching pairs to work properly.