Match Job Advanced Settings
  • 09 Aug 2024
  • 7 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Match Job Advanced Settings

  • Dark
    Light
  • PDF

Article summary

Overview

Advanced settings can be configured on a Match job after the initial save of the job. To access advanced settings, navigate to the Edit Match job page and click on the Advanced settings button in the Standard Settings section.

Matching Score

Match uses scoring to control what is presented as a potential match, reducing false positive matches and increasing correctly matched data. During the run of the Match job, records with the same match keys are grouped together and then compared with each other. Refer to Match Keys below for more details on match keys.

Match uses the scoring weights for specific fields to determine an overall match score. If the score reaches the defined threshold score for a given match level, then a matching pair is written to the results along with the score achieved.

The following score details must be set for a Match job:

  • Minimum Score—sets the minimum score for records to be recognized as a potential match. For example, if the minimum score setting is 80, then the combined component scores must be greater than 80 to be considered as a match.

  • Max Cluster Size—sets the maximum size that a cluster can be before being reported as a large cluster. Clusters are the groupings of records created by the initial matching done on the match keys.

  • Output % Score—when toggled to the On position, the overall score is displayed on the Match job record results to show how similar records are.

Match Keys

Match keys are defined fields in each record used to compare and group potentially matching records. Refer to Match Keys Configuration for details on available match keys.

To add a Match key:

  1. Click the Add key button.

  2. On the Add Key dialog box, select the Type. Choose between Fuzzy and Exact to determine whether the match needs to be an exact match (Exact) or a similar match (Fuzzy).

  3. Choose a field type from the Fields list box.

  4. Click the Add button. The Field type is added to the grid.

  5. On the field name added to the grid, select the Function. Refer to Match Key Functions for more details.

  6. Enter the desired Start and Length for the key.

  7. Toggle the Optional setting to the On position if the function for the key is optional for matching results. Otherwise, leave this setting in the Off position.

  8. Click the Save button to save the match key(s).

Note

More than one key can be added on the Add Key dialog box before saving. Repeat steps 2 - 7 above to add multiple keys before saving.

Once match keys are saved, the selections display in the Match Keys and Match Key Fields grid.

Match Keys Grid

The Match keys selected in the section above display here to indicate Type (Exact or Fuzzy) and Key Name. The following actions can be taken from this grid:

  • Skip Fuzzy—if toggled to the On position, only one representative record from each exact key cluster (the first record added) is considered for fuzzy clustering.

  • Move Match Key—allows for reordering Match keys to run in the desired sequence

  • Edit Match Key—displays the Match Keys dialog box, allowing you to edit the selected Match Key and any other Match key for the Match job.

  • Delete Match Key—removes the Match key from the list. This also removes the corresponding data in the Match Key Fields grid below.

Match Key Fields

The Match Key Fields selected in the previous section display here to indicate the Field Name, Function, Start, and Length. The following actions can be taken from this grid:

  • Optional—if toggled to the On position, renders all non-optional keys within the Match Key required (i.e., records containing any blank values are not clustered and compared)

  • Move Field—allows for reordering Match Key Fields to run in the desired sequence

  • Delete Field—removes the Match Key Field from the list. This also removes the corresponding data in the Match Keys grid above.

Matching Rules

Matching rules consist of constraints and weights for each matching level that are used when compared records are scored.

Constraints

Choose one or more of the following constraints:

  • Must match gender—When selected, potential matches are disregarded if their genders differ. If the gender is unknown in one or both of the records, the records will potentially be classified as a match.

  • Must match suffix—When selected, potential matches will be disregarded if their suffixes differ. If the suffix is unknown in one or both of the records, the records will potentially be classified as a match.

  • Must match joint names—When selected, potential matches will be disregarded if one record has a joint name and the other does not. For example, normal behavior matches “Mr. and Mrs. J Smith” with “Mr. J Smith”; selecting this option prevents such matches.

Weights

A total score is calculated when two records are compared; this is the sum of the scores generated for each component within the two records, i.e., name, organization, address, postcode, etc.

Components are scored using the following weights:

  • Sure—identical or equivalent components

  • Likely—very similar components

  • Possible—less similar components

  • One Empty—one record contains no data for the component

  • Both Empty—both components contain no data for the component

Scoring thresholds can be applied to each field to provide further matching requirements when two records are compared. The total score begins at 0. As the score for each component is added to the total score, the total score must match or exceed the component’s specified threshold. If the total score is lower than a specified threshold, then the two records are rejected as a match and are scored as a 0.

For example, when looking for duplicates in data containing individuals’ names, addresses, and postcodes, the initial total score is 0. The first component (name) is added to the total. The total score must then exceed the name threshold. Using a threshold of 25 ensures that the names within all matching pairs score at least a Possible weight. The next component (organization) is then added to the total score, and the threshold is checked; a value of 0 (the default) effectively disables the threshold. The next component (address) is then added to the score total; a value of 55 requires that the address matches or exceeds a score of 30 because it has already been established that the name scores 25 or more. Assuming the total score is 55 or more, the next component is checked. Each subsequent component’s score is added to the total score and its threshold is checked each time.

Matching Matrices

The matching matrix allows you to configure the confidence that each type of name match achieves using the following combinations:

  • Name—LastName, FirstName, MiddleName

  • Company Name—Org Name1, Org Name2, Org Name3

The confidence level maps to the Match weights in the advanced settings, which ultimately determine the name score for that combination. For example, the following setting ensures identical names will be assigned the score associated with a SURE match as determined in the Weights configuration for the Match job:

[Equal LastName + Equal FirstName + Equal MiddleName = SURE]

Post-matching Rules

Post-matching rules are applied to fuzzy matching pairs before grouping. Each rule specifies both a condition using a SQL-like syntax, along with an action that determines what happens when a condition is satisfied.

To add a Post-matching rule:

  1. Click the Add Post-matching rule button.

  2. On the Add Post-matching Rule dialog box, enter a Condition for the Post-matching rule.

  3. Select the appropriate action for the rule from the Action list box. Choose from:

    1. Delete—deletes the attribute once the condition is met

    2. Keep—retains the attribute once the condition is met

    3. Review—marks the attribute for review once the condition is met

  4. Click the Save button.

The post-matching rules are added to the grid and can be edited or deleted using the Edit post-matching rule and Delete post-matching rule icons.

Grouping Options

Bridging Prevention

Bridging happens when two matching pairs are combined by a common record to form a group where the other two records are not a good match. For example, consider the following two matching pairs:

Pair 1

  • AR017900|MISS|SUSAN|DAVIS|

  • AY272090|MS| |DAVIS|

Pair 2

  • AK557942|MRS|KAREN|DAVIS|

  • AY272090|MS| |DAVIS|

Each pair is a good match in itself. However, when combined into a group based on the common record AY272090, a bridge is created between the two non-matching records as shown here:

Bridged Group

  • AR017900|MISS|SUSAN|DAVIS|

  • AK557942|MRS|KAREN|DAVIS|

  • AY272090|MS| |DAVIS|

Several situations can cause bridging. The common factor is that some information is missing from a record. Causes of bridging include:

  • Name bridging—A record is missing the forename and matches multiple records with different forenames (as in the previous example)

  • Prefix bridging—A record with “Ms” matches to a record with “Miss” and to a record with “Mrs” (where one of the forenames is empty or just an initial)

  • Company bridging—A record with a company name acronym or partial company name matches multiple company names

Bridging prevention splits bridged groups into sub-groups. Bridging prevention will not break apart matches flagged as Keep by a post-matching rule as outlined in the previous section.

Select any of the following to manage and prevent various types of bridging:

  • Name Bridging Prevention—prevents bridging caused by missing forenames

  • Prefix Bridging Prevention—prevents bridging caused by “Ms” only if Name Bridging Prevention is also enabled

  • Company Bridging Prevention—prevents bridging caused by company name acronyms

  • Aggressive Splitting—disassociates bridging records from all matching records. If left disabled, bridging records will remain matched to one sub-group of non-bridging records.

Note

When enabling bridging prevention, it is important to disable the Skip Fuzzy option on any exact keys in the Match Keys section. Bridging prevention must know about all matching pairs to work properly.


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.
ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence