Send Close Add comments: (status displays here)
Got it!  This site "www.robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Customer matching
by RS  admin@robinsnyder.com : 1024 x 640


1. Customer matching
Here are some general comments on matching in terms of customers.

The advantage of recognizing the general pattern is that design decisions, database structures, sequential and parallel algorithms, trade-offs' etc., fall into place as they have been well studied and documented in the past. In general, matching falls under the general category of equivalence relations and classes. An equivalence relation is a relation that is reflexive, symmetric, and transitive. An equivalence relation results in (i.e., induces) equivalence classes. There are known algorithms for the primary operations of equivalence classes, union and find. That is:

2. Tracking people
In tracking people, the relation is a "belongs to" relation where each class is a unique person.

So a name "belongs to" the person, an email address "belongs to" a person, etc.

The find operation would be, "who does this email address belong to?".

A union operation would be the merging of two sets of classes whereby a "person" had what had been previously considered two separate social network accounts but, at some point in time, is determined to be the same person.

This involves the transitivity property of the relation.

3. Group segmentation
In group segmentation, the relation may be "belongs to an age group".

The find operation is "to what age group does a person belong" or "who are the members of this age group".

The union operation is to add a new person to an age group once that person's age is determined (e.g., from date of birth).

4. IS-A and HAS-A
In a database modeling terms, the matching relationships in a system fall into the categories of "HAS-A" or "IS-A" relationships. Each is modeled/realized differently in a database implementation.

5. HAS-A
For example, a person "HAS-A" social security number, a person "has an" email address, etc.

Whenever the relationship, or association, is 1 to 1 (e.g., the way Social Security Numbers were designed, to identify a person), then in "A has a B", the B can be stored in the same table row as the A.

If the relationship is 1 to many (i.e., B functionally determines A), then a separate table (i.e., intersection table) is necessary with a pointer/link from the B to the A. If A to B is 1 to 0 (info is missing or optional) or 1 (info is available), then either method can be used (i.e., a null in the table or an auxiliary table). Purists (e.g., relational database purists) advocate the additional table (to maintain strict normal form) while many prefer the null in the table cell.

6. IS-A
The "IS-A" relationship is usually modeled as an object hierarchy since, for example, many people can be a member of a group (i.e., a person "is a" member of a group) and that group can be part of a larger group, etc.

Such relationships are many to 1 (perhaps with multiple levels) and are typically implemented/realized in a database using an auxiliary intersection table.

7. Implementations
In more flexible implementations/situations, the relations are modeled as dynamic sets using meta-tables (e.g., name-value pair lists, sometimes called property lists) and not as fixed fields in separate tables in a database (which is machine efficient but inflexible for dynamic situations).

8. Matching
The actual matching relation can be done in many ways. Here are some examples. With incomplete information, a probability can/should be placed on a potential match. This is often done with some form of Bayesian analysis/inference. More involved probability matching can be done with Bayesian networks where the probability calculations become more involved and all depend an the assumptions made in the model.

9. Situations
In such situations, there is a need for some probability to be associated with each piece of information. In most database implementations, the implicit assumption is that the data is 100% accurate although it is well known that almost all databases contain errors (euphemistically called anomalies in database terminology).

10. End of page

by RS  admin@robinsnyder.com : 1024 x 640