Accurate data collection is extremely important, but if you’re not organising that data in an intuitive & flexible way then it’s (often literally) useless.
The chasm separating ‘correct’ and optimal can be the difference between genuinely data-driven decision-making, and the superficial validation exercises that many of us reluctantly accept.
GA4 reporting identities could be the biggest such opportunity for your business, and that’s what I’ll be taking you through today.
Overview
What is a reporting identity and where can I find it?
How does the Device-based reporting identity work?
How does the Observed reporting identity work?
Why do I sometimes see a higher user count when using the Observed reporting identity?
How does the Blended reporting identity work?
Which one should I be using?
What is a reporting identity and where can I find it?
In short, reporting identities (RIs) are simply different methods of counting users in your property. The goal behind having more than one possible counting method for users is to ensure that the user count displayed in reporting is as close as possible to the actual number of people who have visited your site/app.
There are 3 reporting identities available (Device Based, Observed, and Blended). We’ll explain below, in great detail, how each one works, but it’s important to note that reporting identities can be changed at any time and the data being displayed in reports will be updated retroactively as well.
Switching between reporting identities won’t have any permanent impact on the data already collected and won’t change how GA4 collects new data.
However, you must be aware that changing the reporting identity doesn’t just change the data being displayed in your reports. The updated data being displayed is reflected at a property level, it will impact the data shown to all users with access to that property. As such, here are the top 3 aspects to keep in mind when changing RIs:
- Changing the reporting identity shouldn’t be done on a whim. If you need to change it temporarily, it should be communicated internally to anyone running reports on that property. Otherwise, users might extract reports just as you’ve changed the identity, causing a lot of confusion.
- Don’t forget to change it back to whatever the agreed RI for that property is.
- Don’t forget to check the RI on the property before running any reports (not everyone remembers to change it back so do your own check before running reports)
Reporting identities in GA4 can be found in the Admin Section, under Data Display.
The Device reporting identity can be hidden and you’ll have to click “Show all” to uncover it.
How does the Device-based reporting identity work?
This reporting identity counts/identifies users in the exact same way as Universal Analytics used to, based on the Device ID (for websites this information comes from the Client ID, representing a unique device-browser combination, while for apps the data comes from the users’ app instance ID).
This means that the exact same person visiting your app from their phone and from their tablet, then visiting your site from their laptop using Chrome and from the same laptop using Firefox will show up as 4 different users in reporting, based on each Device ID.
The obvious disadvantage of this counting method/reporting identity is that, as explained above, it will inflate the user count and the number of users shown in reporting could be quite far off from the actual people visiting the site. The user count could also be inflated by users reinstalling their app / clearing app cache / clearing browser cookies and “resetting their identity”. Moreover, using this reporting identity we don’t get a unified view of the person’s behaviour across platforms/devices.
How does the Observed reporting identity work?
The Observed reporting identity utilises a combination of User ID and Device ID to track users. It prioritises the User ID, if it has been collected. When User ID data isn’t available, GA4 automatically falls back on the Device ID.
A User ID usually represents the unique identifier generated in your backend systems when users create an account on your website/app. It is therefore available to be sent on any platform, as long as the user logs in on that platform (your devs would have to surface that information into the data layer or relevant app storage / SDK).
In theory, because both the website and the app can pass a consistent User ID regardless of what device (or device/browser combination) is used, this reporting identity should help deduplicate users across devices and platforms.
- Then why do some GA4 accounts show a higher user count when using the Observed reporting identity, as compared to the Device-based one?
Most GA4 users are surprised to see that when using the Observed reporting identity (which should, in theory at least, help deduplicate logged-in users across platforms and devices) they are seeing a higher number of users as compared to the Device-based reporting identity. Although this may seem counterintuitive, it is in fact expected due to how the user stitching works.
To understand the inner workings of this reporting identity let’s imagine the following user navigational pattern:
- Day 1 – User navigates to the site for the first time and browses through products but doesn’t create an account and doesn’t log in, in this session.
- Day 2 – User navigates back to the site, browses some more, creates an account and logs in, then continues navigating and eventually leaves the site.
- Day 3 – User comes back to the site, still logged in from their previous session and navigates through the site. Their login either expires automatically within the session or they manually log out and then generate a few more events while logged out and leave the site.
*The above assumes all 3 sessions happen on the same device and browser.
The results would be the following:
- Device-Based reporting identity
Screenshot from GA4 interface test:
- Observed reporting identity
Screenshot from GA4 interface test:
Now let’s understand why:
- On the first visit to the site, there is no User ID available, so the Observed reporting identity will fall back to using the Device ID for user identification/counting and it will count “User 1”.
- On the second visit to the site, the first few events of the session only have the Device ID associated with them, before the user logs in (this will be the same Device ID as in session 1). Once the user is logged in, all events have both the Device ID and the User ID associated with them. Based on that, Google is able to stitch the events from that session, prior to login, with the same User ID as the hits post login. However, Google only does this type of stitching within the session. The User ID surfaced in the second session won’t be stitched to the hits from the first session (even if the logged-out hits from session 1 and session 2 have the same Device ID and in theory, the stitching could be done just as easily, but, for various presumed reason, Google isn’t doing that).
- In the third visit to the site, while the user is still logged in, identification is done based on the User ID and until the user logs out they are still identified as the same user from that second visit to the site. However, as soon as they log out, stitching stops and the user is identified again based on their Device ID and is now seen as the same user from that first visit to the site. This is because even though it is technically possible, Google doesn’t do stitching after the user has logged out from their accounts. Moreover, because in that third visit the User Identification changes mid-session, 2 sessions are reported for the third visit instead of one.
It’s important to note that when a user logs out and the identification method changes, no new session_start event is triggered. Although two sessions are reported against the Client ID (in the screenshot it’s the ID starting with 744), when reviewing the data in User Explorer, only one session_start event appears—the one linked to the user’s first visit. The subsequent session_start events, which correspond to the user’s second and third visits, are associated with the User ID (in the screenshot that’s the ID starting with 22):
User Identification | Session Count | Session Start events count |
Client ID | 1 (from the first visit) 1 (from the third visit, post logout) | 1 (from the first visit) |
User ID | 1 (from the second visit) 1 (from the third visit, before logout) | 1 (from the second visit) 1 (from the third visit) |
Total | 4 sessions in total | 3 session_start events |
Important Considerations for both Device-Based and Observed reporting identities:
- These reporting identities include only data from consented users.
- Even if you have an Advanced Consent Mode v2 implementation, unconsented data will not be reflected at all in reporting when using these reporting identities.
- The user count resulting in the GA4 interface, when using either of these reporting identities, can be replicated almost exactly in BigQuery (conditions have to be added in the BQ logic to ignore cookieless pings for unconsented users when Advanced Consent Mode is in play).
How does the Blended reporting identity work?
The Blended reporting identity counts users based on the User ID if it is collected. If User ID information is not available, then Analytics uses the Device ID. If no identifier is available, Analytics uses modelling.
However, modelling only comes into play if anonymised data is collected from unconsented users (Advanced Consent Mode) and if your property meets all the required criteria (including data quality) for Google to apply modelling on that anonymised user data to estimate user count. Otherwise, this note will be seen in the property:
Important Considerations:
- If you have an Advanced Consent Mode implementation (not basic), along with a high enough data volume and sufficient data quality for modelling, then this reporting identity will include modelled data for unconsented traffic and therefore display a higher user count compared to the other 2 identities.
- The user count resulting in the GA4 interface when using this reporting identity (when modelling is active) CANNOT be replicated in BigQuery because the advanced modelling used for unconsented data is proprietary to Google and, while unconsented data is collected in BigQuery, we can’t apply the same modelling as Google does to replicate a similar user count.
- This poses a significant challenge for some businesses, causing them to opt out of using the Blended reporting identity even in the interface.
Which reporting identity should I be using?
As with most things, there is no one-size-fits-all answer. The choice of reporting identity depends on the business, but we can share here some considerations for choosing your reporting identity.
- Does the business have and collect a User ID?
- Without it, the Observed RI is irrelevant.
- Does the business have an Advanced or a Basic Consent Mode implementation?
- With a Basic Consent Mode implementation, there is no modelling in the property and therefore the Blended reporting identity wouldn’t be much different from Observed.
- Does the business often use both data from the GA4 interface and data from BigQuery?
- Given that the Blended reporting identity can’t be replicated in BQ, using this identity in the interface can lead to conflicting data between GA4 interface and BQ data. This can easily lead to mistrust in the data, in which case the benefits of Blended won’t outweigh the drawbacks.
Ultimately the best option depends on the unique needs and data collection practices of your business. Hopefully this article provided you with an understanding of how reporting identities work, why the results may seem counterintuitive at times, and some of the key considerations for choosing a reporting identity that will provide the most accurate insights for your reporting needs.