Privacy

What Is Identity Matching and How Does It Work?

Updated on: February 13, 2024

February 13, 2024

Identify users across multiple devices and serve personalized content to improve engagement rates.

Identity matching has played a major role in digital advertising for several years. For much of history, users have been identified based on cookie IDs, device IDs, and other parameters. The IDs can help to gather heaps of audience information to deliver targeted ads on the open web.

However, tracking and identifying the users have become difficult with the lack of third-party cookies in browsers (like Safari, Mozilla, Firefox, etc.) and data privacy laws. With Google working to phase out the support for third-party cookies, identity matching is arguably the most important problem we need to solve now. Besides, the shift of the audience towards connected devices has made the situation more complex than ever before.

In a world where advertisers are getting smarter and seeking a complete view of the audience to improve their ability to reach relevant audiences, identity matching is devised by the ad-tech consortium to help the publishers and advertisers differentiate one user from another and track users across multiple devices. So, before bumping into the methodologies of identity matching, let’s have a detailed primer of why we need it.

Table of Contents

The Need for Identity Matching
How Is Identity Matching Done?
Deterministic Matching Vs. Probabilistic Matching
Final Thoughts

The Need for Identity Matching

Publishers and advertisers use personalization to make the advertising more relevant. The irony is that many users change their source while completing their journey (path-to-purchase).

For example, a user may get familiar with an advertiser’s product on a website (via a display or video ad). But, later, the user visits a third-party app (say, social media) and see the ad from the same brand. Eventually, the user gets to the brand’s website to buy the product. As a result, this fragmented user’s viewership becomes difficult for publishers as well as advertisers to identify the user.

Not only the changes in devices but changes in IP addresses also lead to inaccurate attribution. Several users share a WiFi network over different devices. You can take an example of your office where colleagues are connected to one WiFi during office hours.

But, when they surf the internet via their mobile data or personal internet connection, IP addresses vary. Due to this reason, understanding the individual visitor becomes a primary concern for publishers and advertisers.

Hence, to get over the trouble, identity matching is needed. It paves a way to reach the relevant audience in an omnichannel way. Individualized ad serving is hard to do without identity matching as cookies can be deleted, CTV devices don’t have unique identifiers, and disparate data reduces the efficiency of targeted advertising.

So, in various ways, identity matching is required to display the right ads to your audience. To understand it better, you need to understand how it is done.

How Is Identity Matching Done?

Data management platforms can provide identity matching, customer data platforms, or any other data aggregator. They create data points to identify the same user across different channels, locations, and devices. In general, the data points are combinations of:

Device identity points – IP addresses and other identifying information related to the device associated with the user.
Digital identity points – Email addresses, social network links, website registrations, etc.
Terrestrial identity points – Home address, work address, phone number, etc.

The process of identifying a user is completed in the following steps:

Identifying the platforms (websites, social media, etc.), channels (eCommerce app, in-store, etc.), and devices used by the user in the journey to convert and connect the dots between the devices.
Matching the individual user to each device/platform/channel based on attributes.
Validating the data set across devices that it belongs to the same user.

This is a comprehensive overview of identity matching. In essence, identity matching can be done by two methods:

Deterministic Matching, and
Probabilistic Matching.

What Is Deterministic Matching?

Deterministic matching, or explicit matching, is a method to find the exact match between two data sets to identify the same user across different channels and devices. Users are matched based on the following identifiers:

Email address,
Phone number,
Log-in details (user name, address, date of birth, etc.)

Usually, the publishers already possess these details (first-party data) as they collect them when users sign up for newsletters, subscriptions, or any other service. So, a match is only confirmed when the user’s data matches.

Deterministic matching has higher accuracy and hence, improves the user experience. A user doesn’t have to view irrelevant ads or offers at any point in time. But, deterministic matching has a drawback for publishers who don’t store email addresses and other basic user information.

For this reason, many publishers have started collecting deterministic data to improve their targeted advertising by encouraging visitors to share their email addresses. Think, freewalls. You can register an account with your favorite website for free so that they can deliver a personalized experience while you are on the site.

One of the examples of a deterministic identifier is Google ID. Google generates Google ID when you create a Google account. Google uses its ID to identify users and personalize the ad experience across its properties and partnered third-party sites.

Sidenote: While Google ID is from deterministic matching, Google’s DoubleClick ID (its advertising-specific ID) is based on probabilistic matching. As a publisher, if you use any of Google’s ad products — Google Ad Manager, Google AdX, AdSense, etc. your users will be cookied with DoubleClick Cookies (or DoubleClick ID).

What Is Probabilistic Matching?

Probabilistic matching, or implicit matching, compares several data points to identify the same user across multiple devices/channels/platforms. In general, the data aggregators and identity solution providers use a knowledge database and predictive algorithms during the identification process.

In probabilistic matching, devices are linked by looking into the following data points:

IP addresses, Wi-Fi networks,
Device fingerprinting,
Screen resolution,
Operating system, and so on.

Combining several data points identifies a user, and the statistical likelihood helps to find the same user on different devices. To understand how it works, here’s an example of probabilistic matching.

From LiveRamp,

Assume a phone and desktop linked to a household are observed logging onto Wi-Fi at all times of the day throughout the week. Meanwhile, another device that belongs to a friend only logs onto Wi-Fi on the weekends. An algorithm can use this data point with others to infer that the friend’s device does not belong to the same household.

Probabilistic matching can provide scalability and the desired accuracy if we use deterministic data points as a foundation. However, the disadvantage is that it doesn’t give accurate results and lacks transparency in identifying users.

Also, due to several privacy laws and regulations in place and more expected to come in, probabilistic matching will soon expire as they utilize information like IP addresses, device IDs, etc., that are considered personally identifiable information.

Google has had already declared the unavailability of DoubleClick ID for advertisers for analysis and attribution. However, it has recently enabled an alternative to perform analysis via its own data cleanroom — Ads Data Hub.

Deterministic Matching Vs. Probabilistic Matching

It’s difficult to say the better option as it depends on the publishers’ and advertisers’ requirements. In some cases, only probabilistic matching of them can work. In other cases, you may be required to combine both to match the users better. In general, it is recommended to:

Get started with a first-party data strategy and see if you can collect deterministic data points (for instance, email addresses) if the user has already shown interest. Many publishers are leveraging the current spike in traffic to establish a strong relationship with visitors and convert them into known readers.
You must use probabilistic matching as long as it exists, as advertisers typically want to extend their reach. They can’t match with just tens of thousands of users on the open and run programmatic campaigns. If you run a direct deal, you can sync first-party data with the advertiser to improve conversion rates, but a generic open auction requires probabilistic IDs.

Final Thoughts

Identity matching is a critical aspect of advertising for publishers, as it allows them to increase ad revenue and improve targeting and relevance. However, before choosing an identity-matching methodology, you should carefully consider data-specific factors such as the source of data collection, data quality, timelines, and accuracy.

Additionally, you should analyze first-party data and work closely with data aggregators to better understand the audience data. Doing so allows you to use identity matching responsibly without violating privacy laws to improve ad performance and revenue.