Chapter 2- ID Unification

Why build a single customer view… and how to?

We’ve seen a common scenario where different teams across a business have a different understanding of their customers and even a difference in understanding within each team across the platforms they use. This results from the partial view they each have from fragmented behaviour tracked in different tools across different teams.

Building a complete picture of an individual customer’s online and offline interactions provides the foundation for building an optimal customer experience. It allows for personalised and consistent messaging to be sent to customers at the right time and in the right place. In addition, this comprehensive understanding of a single customer view can improve operational efficiencies and effectiveness alongside building an ability to create models to analyse behaviour and predict future behaviour.

However, a single customer view is not possible without connecting data sources to unify fragmented data. If we have access to unique identifiers in our data set then we can use a primary key to link this data deterministically where exact matches can be made for an individual across datasets. Where there is a lack of unique identifiers it is still possible to gain an approximate view of a single customer using a probabilistic model of matching. While it will not provide as accurate a match, there are benefits to this approach where a wider reach is required and personalisation requirements are less.

This article will focus on deterministic matching and the primary key we will consider will be customer identifiers, so we’ll take a look at what to use and when.

Current Tracking Landscape

Tracking (and retargeting) unique customers is becoming increasingly difficult due to the ePrivacy Regulation, GDPR, Intelligent Tracking Prevention (ITP) and App Tracking Transparency (ATT). There are a million articles that go into the details of these, so we won’t cover those here. However, don’t forget to look at our recent articles relating to these topics: “Why are 3rd party cookies dying and what do I need to do about it?” and “Is server-side tracking the answer to all my cookie and privacy woes?”

The primary considerations for building a single customer view with these regulations and protective technologies in mind are:

Obtaining explicit opt-in for particular uses of collected data
The use of cookies and browser storage that could be used to track users across domains
Potential disruption in app login flows using OAuth or similar

The type of interaction (i.e. in-store, through a customer service telephone call or onsite) and the authentication state (i.e. are they logged into their website account) alongside the above-mentioned considerations will determine what potential identifiers are available and how they can be tracked and used.

What Identifiers?

There are many potential identifiers that could be used to identify an individual and link their behaviour together across devices or online/offline interactions. Here are some common examples:

Email address
Names
Delivery address
IP address
Mobile number
Customer account ID
Cookies
Device IDs

These have varying levels of usefulness and privacy considerations due to their uniqueness and persistence, alongside obtaining customers’ informed consent for their use. Customer Data Platforms (CDPs) primarily use deterministic matching when building unified profiles to link profile data fragments. This means that a unique and persistent identifier is used to stitch together information about a customer’s behaviour across all potential touchpoints and systems. This approach results in high accuracy and confidence in the data due to the matched identifier being used to combine these data points.

More advanced solutions may also offer probabilistic matching options (directly or via integrations) to be used alone or in combination with a hierarchy of other IDs. This may be available from the platform directly or many offer integrations with third party providers to onboard external device graphs.

For primary identifiers, email addresses or customer account IDs are primarily used as these are likely to represent an individual. Whereas multiple customers may have the same name, personal delivery address or IP address. Cookies or device IDs are only persistent for as long as they’re not deleted or overwritten. Therefore the latter examples alone are not strong options for using as a primary identifier for a matching key.

However, the identifiers that will be available depend on the situation that the customer is interacting with the business. Let’s take a look at a few example identifiers that would be available in various scenarios:

In-store
A customer may sign up for a loyalty account in-store, which would typically result in them providing an email address, name, physical address and telephone number.
Call centre
When a customer contacts a call centre, it is likely that a telephone number and customer account ID could be available.
Onsite or apps Anonymous identifiers stored in browser storage, such as cookies, or device IDs on apps are available and can be tracked while a user is not authenticated. If a customer logs into an account or submits their information by signing up to a newsletter or making a purchase then a customer account ID or email address is also likely to be available.

Tracking these identifiers into profile fragments

Customer interactions and the context relating to these, such as the products they have browsed, should be tracked with the most persistent and unique available identifier at that time. The more persistent and unique the identifier is, the more accurate the view of the unique customer will be.

When this data is tracked it will be tracked as individual profile fragments, where we can understand broken up parts of the customer behaviour. While these don’t provide a complete picture of a unique user, they can still be used in analysis and targeted activities. This is where the utilisation of probabilistic capabilities as a fall-back within CDPs can be hugely valuable. This is because the volume of data from unauthenticated users is often greater than the data tied to unified customer profiles from authenticated users.

However the real strength of a CDP comes from the more detailed analysis and personalisation available as a result of unifying data into a unique profile.

Let’s look at an example journey a customer may have across multiple devices and this fragmented behaviour being joined:

A new customer visits a website and browses several products on their laptop. This behaviour is tracked against a cookie identifier and stored in a profile fragment.
The customer then signs up for a customer account after being offered a discount promoted to new customers. During this registration a unique customer account ID is generated while they have provided their email address. This sign up is tracked to the profile fragment associated with the cookie identifier and is enriched with the tracked primary IDs of email address and customer account ID.
The customer leaves the site
The customer then returns to the site on their mobile device and browses several other products. This is captured in a separate profile fragment associated with a new cookie identifier.
The customer adds an item to their basket and completes a purchase. During the purchase they log in to their account in order to use their new discount promotion and provide their name, email address and physical address which are tracked into this new profile fragment and because they have now authenticated their customer account ID is also tracked. As the email address and customer account ID are considered primary IDs in this example configuration, the interactions that have occurred on their mobile device can now be unified with the earlier session carried out on the laptop.

The unification

Profile stitching happens in slightly different ways depending on the CDP that is being used. Essentially when new interactions are tracked with a specified identifier, many CDPs will deterministically match with existing profiles that have the same identifier. If one is found then data from this profile fragment can be combined into the unified customer profile.

The unification process is taken a step further for platforms that offer probabilistic matching. Data that has been matched deterministically can be used in the predictive modelling for probabilistic matching to connect anonymous activity to known users. An essential factor here is the quality of the data that is being used as training data. The higher the quality, the higher the confidence level will be in matching the anonymous behaviour to known profiles. Match rules, hierarchies, attribute priorities, event priorities, confidence levels, and profile merge logic are all then applied to construct the unified profiles.

Some platforms may offer integrations with third parties to offer probabilistic matching and enrich existing customer data to expand the reach and scope of activities.

One of the major benefits of profile stitching is that when this occurs it can tie together important historical data collected prior to the unified customer profile being created.

For example, we work with several businesses in the Travel industry where new customers will browse their websites for holiday packages multiple times before authenticating and providing an identifier i.e. by registering for newsletters or moving along a booking funnel. When they do, these businesses are able to promote specific packages tailored to their historically tracked behaviour, significantly increasing the chances of a booking being made.

Probabilistic matching methods are obviously useful in attempting to create profiles from the anonymous behaviour but the margin of error must be considered. These types of profiles are better suited for one-to-many use cases, where you are delivering a single experience to a broad audience and therefore incorrect matches won’t significantly harm your activities.

Tracking an unauthenticated customer on websites

Tracking an authenticated user is relatively easy, provided that the authenticated identifier is made available to the tracking. However as most initial interactions will be from anonymous customers on your website, it is important to consider how these customers can be tracked.

Cookies, browser storage and device IDs are generally used however as mentioned earlier the use of these is becoming more restricted for tracking purposes. For example ITP currently attempts to primarily prevent the tracking of users across domains. This affects both first and third party cookies, browser storage and methods where CNAME cloaking is being used.

It is safe to assume that any implementation should avoid third party cookies at all costs with their inevitable death. However even first party cookies set on the client side are significantly affected by ITP. Generally if data is stored in these then a 7 day expiration period will be applied. If they are identified to be related to a known tracker or the referring domain’s URL has a query parameter and/or fragments this is further reduced to 24 hours.

This makes understanding the behaviour of an unauthenticated individual over a length of time increasingly problematic. As mentioned, probabilistic matching methods do help in these scenarios but may become less accurate as restrictions tighten on the use of anonymous identifiers.

How can we avoid the short expiration period?

In short, setting the cookie with an identifier from the server allows you to persist the identifier for a longer period of time. But for a more detailed description I will take an explanation that we provided in an earlier article:

The server should create an anonymised identifier the first time that a visitor lands on the site. This is stored as a first-party cookie in the user’s browser, utilising the “Set-Cookie” header in the initial page response from the server. Because this first-party cookie (HTTP Cookie) is actually set by the site’s server and not by JavaScript executed in the browser (JS Cookie), it is not subject to the same expiry limits imposed under Apple’s ITP and other similar technologies*.

*Note that using CNAMEs (domain cloaking) to set the HTTP cookie IS subject to Apple’s ITP policy.

Setting the cookie with this method will allow you to set a suitable expiration period and as long as the customer doesn’t clear their browser data, you will be able to link visits together from the same unauthenticated visitor when they are using the same device and browser.

What else should you consider when planning your tracking?

Here a couple of other important considerations when determining what identifiers to track and how they can be used:

Data Governance: Customer identifiers and other data points tracked about your customers’ behaviour should be used inline with existing governance policies that you may have or need to implement for better data practices and to adhere to data privacy regulations. How these identifiers are used alongside the tracked behaviour needs to be determined by your business. However we normally recommend first splitting your data points into these 4 categories of data use detailed in a previous article. Once data has been categorised and tracked into these categories then they should only be used for those purposes only.

Shared identifiers: Are you an organisation that have users accessing shared accounts, for example, customers that access your site under an institutional login for universities or a business account? Ideally you want to be able to track a shared identifier and an individual identifier so that you can understand behaviour at both a group level and an individual level. Therefore it is important that you consider a CDP vendor that is able to support this with their identification system to maximise the possibilities of what you are able to achieve with their platform.

Ultimately the more behaviour that can be tracked with unique and persistent identifiers, the more detailed a single customer view can be built. This unique profile view ensures a successful customer digital experience due to the personalisation capabilities of the system that is built and used consistently across all of your customer touchpoints. Therefore planning out how to track available identifiers across your touchpoints is a fundamental step in approaching a CDP project.

In our next article in the series we will look at how to approach selecting a vendor.

Contact us today to discuss your requirements

Get in touch