Identity matching has been playing a major role in digital advertising for several years. For much of history, users have been identified based on cookie IDs, device IDs, and a few other parameters. The IDs can definitely help to gather heaps of audience information to deliver targeted ads on the open web.
However, tracking and identifying the users have become difficult with the lack of third-party cookies in browsers (like Safari, Mozilla Firefox, etc.) and data privacy laws. With Google working to phase out the support for third-party cookies, identity matching is arguably the most-important problem we need to solve now. Besides, the shift of audience towards connected devices have made the situation more complex than ever before.
According to the Pew Research Center, 84% of households in the United States had at least one smartphone and one-third of the respondents had more than 3 smartphones in 2017. And, the usage of smartphones and other CTV devices will steadily increase in the future.
In a world where advertisers are getting smarter and seeking a complete view of the audience to improve their ability to reach relevant audiences, identity matching is devised by the ad-tech consortium to help the publishers and advertisers differentiate one user from other and track users across multiple devices.
So, before bumping into the methodologies of identity matching, let’s have a detailed primer of why do we need it in place.
Why Do We Need Identity Matching?
Publishers and advertisers use personalization to make the advertising more relevant. The irony is that many users change their source while completing their journey (path-to-purchase).
For example, a user may get familiar with an advertiser’s product on a website (via a display or video ad). But, later the user visits a third-party app (say, social media) and see the ad from the same brand. Eventually, the user gets to the brand website to buy the product. As a result, this fragmented user’s viewership becomes difficult for publishers as well as advertisers to identify the user.
Not only the changes in devices, but changes in IP addresses also lead to inaccurate attribution. A Wi-Fi network is shared by several users over different devices. You can take an example of your office where colleagues are connected to one Wi-Fi during office hours.
But, when they surf internet via their mobile data or personal internet connection, IP addresses vary. Due to this reason, understanding the individual visitor becomes a primary concern for the publishers and advertisers.
Hence, to get over the trouble, identity matching is needed. It paves a way to reach the relevant audience in an omnichannel way. Individualized ad serving is hard to do without identity matching as cookies can be deleted, CTV devices don’t have unique identifiers, and disparate data reduces the efficiency of targeted advertising.
So, in various ways, identity matching is required to display the right ads to your audience. To understand it better, you need to understand how it is done.
How Identity Matching is Done?
Identity matching can be provided by data management platforms, customer data platforms or any other data aggregator. They create a set of data points to identify the same user across different channels, locations, and devices. In general, the data points are combinations of:
- Device identity points – IP addresses and other identifying information related to the device associated with the user.
- Digital identity points – Email addresses, social network links, website registrations, and so on.
- Terrestrial identity points – Home address, work address, phone number, etc.
The process of identifying a user is completed in the following steps:
- Identifying the platforms (websites, social media, etc.), channels (eCommerce app, in-store, etc.), and devices used by the user in the journey to convert, and connecting the dots between the devices.
- Matching the individual user to each device/platform/channel based on a set of attributes.
- Validating the data set across devices that it belongs to the same user.
This is a comprehensive overview of identity matching. In essence, identity matching can be done by two methods:
- Deterministic Matching, and
- Probabilistic Matching.
What is Deterministic Matching?
Deterministic matching also known as explicit matching is a method to find the exact match between two sets of data to identify the same user across different channels and devices. Users are matched based on the following identifiers:
- Email address,
- Phone number,
- Log-in details (user name, address, date of birth, etc.)
Usually, these details (first-party data) are already possessed by the publishers as they collect it when users sign up for newsletters, subscriptions, or any other service. So, a match is only confirmed when the data of the user exactly matches.
Deterministic matching has higher accuracy and hence, improves the user experience. A user doesn’t have to view irrelevant ads or offers at any point in time. But, deterministic matching has a drawback for publishers who don’t store email addresses and other basic information about users.
For the reason, many publishers have started to collect deterministic data to improve their targeted advertising by encouraging visitors to share their email addresses with them. Think, freewalls. You can register an account with your favourite website for free so that they can deliver personalised experience while you are on the site.
One of the examples of deterministic identifier is Google ID. Google generates Google ID when you create a Google account. Google takes advantage of its ID to identify users and personalise the ad experience across its own properties and partnered third-party sites.
Sidenote: While Google ID is from deterministic matching, Google’s DoubleClick ID (it’s advertising-specific ID) is based on probabilistic matching. As a publisher, if you use any of the Google’s ad products — Google Ad Manager, Google AdX, AdSense, etc. your users will be cookied with DoubleClick Cookies (or DoubleClick ID).
What is Probabilistic Matching?
Probabilistic matching also known as implicit matching compares several data points together to identify the same user across multiple devices/channels/platforms. In general, a knowledge database and predictive algorithms are used by the data aggregators and identity solution providers during the identification process.
In probabilistic matching, devices are linked by looking into the following data points:
- IP addresses, Wi-Fi networks,
- Device fingerprinting,
- Screen resolution,
- Operating system, and so on.
By combining several data points, a user is identified and the statistical likelihood helps to find the same user on different devices. To understand how it works, here’s an example of probabilistic matching.
Assume a phone and desktop linked to a household are observed logging onto Wi-Fi at all times of the day throughout the week. Meanwhile, another device that belongs to a friend only logs onto Wi-Fi on the weekends. An algorithm can use this data point in combination with others to infer that the friend’s device does not belong to the same household.
Probabilistic matching can provide you the scalability and the desired accuracy if we use deterministic data points as a foundation. However, the disadvantage is that it doesn’t give accurate results and lacks transparency in the identification of users.
Also, due to several privacy laws and regulations in place and more expected to coming in, probabilistic matching will soon expire as they utilize information like IP addresses, device IDs, etc. that are considered as personally identifiable information.
Google’s has had already declared the unavailability of DoubleClick ID for advertisers for analysis and attribution. However, it has recently enabled an alternative to perform analysis via its own data clean room — Ads Data Hub.
Deterministic Matching Vs Probabilistic Matching
It’s quite difficult to say what’s the better option as it depends on the publishers’ and advertisers’ requirements. In some cases, only probabalistic matching of them can work. In other cases, you may be required to combine both to have a better chance of matching the users. In general, it is recommended to:
- Get started with first-party data strategy and see if you can collect deterministic data points (for instance, email addresses) if the user has already shown a sign of interest. Many publishers are leveraging the current spike in traffic to establish a strong relationship with the visitors and convert them into known readers.
- You have to make use of probabilistic matching as long as it exists as advertisers typically want to extend their reach. They can’t match with just tens of thousands of users on the open and run programmatic campaigns. Perhaps, if you are running a direct deal, then you can sync first-party data with the advertiser to better the conversion rates, but generic open auction requires probabilistic IDs.
Before you make any decision on which identity matching methodology to use, you should consider some data-specific factors such as a source of data collection, quality of data, timelines of data (how old or new information is), and accuracy. The secret to success is somehow complicated. Publishers must have to closely analyze their first-party data and work closely with data aggregators (if they are) to understand the audience data. Have any questions? Let us know in the comments.