Date posted

22 Feb 2023

Read Time

Related Articles

Analytics

You can now turn YouTube into a virtual storefront

Google Analytics 4

What is data retention in Google Analytics 4?

Google Analytics 4

What is data thresholding in Google Analytics 4?

Data thresholding is designed to prevent you from identifying who individual users are when viewing a report or exploration within Google Analytics 4. 

It is one of a number of measures taken within the platform to protect and increase privacy. 

The concept of data thresholding is new to Google Analytics 4, and was not part of the Google Analytics solution for Universal Analytics and preceding versions.

How does data thresholding work?

In reporting situations where the identity of an individual user could be inferred (for example due to a low user or event count), data thresholding will hide this data from your Google Analytics 4 report output to protect privacy.

Reports that are subject to data thresholding are based on a subset of users, which may lead to potential confusion and multiple sources of truth when analysing data that is subject to data thresholding.

Data thresholding is hard coded into Google Analytics 4 and cannot be switched on or off. The parameters that govern it cannot be modified or changed at any time.

In this blog, I’ll explain data thresholding in more detail, provide examples of when data thresholding is most likely to apply to your reports in Google Analytics 4, and offer solutions to negate the impact of data thresholding.

Is there anything you can do to reduce the impact of data thresholding in Google Analytics 4?

If your Google Analytics 4 configuration has Google Signals enabled, then the likelihood of your Google Analytics 4 reports triggering data thresholding increases significantly.

Enabling Google Signals within Google Analytics 4 has many positive effects for marketers particularly around retargeting audiences in Google Ads. But, conversely, heavy data thresholding has the potential to cause multiple sources of truth and a lack of confidence in data due to reporting output being based on a subset of users.

This is a difficult balance to judge, but there are some solutions at hand to help.

  • If your Google Analytics 4 configuration has Google Signals enabled, but you are not using Google Signals for remarketing purposes, then we recommend disabling Google Signals within Google Analytics 4. Disabling Google Signals will significantly reduce the impact of data thresholding, and is a suitable trade-off when compared to the alternative of enriching your Google Analytics 4 dataset with demographic data (which is not captured for every user).
  • If your Google Analytics 4 configuration has Google Signals enabled, and you are using Google Signals for remarketing purposes, our recent blog post on reporting identities offers you a solution. By switching your reporting identity to device based, you can significantly reduce the likelihood of data thresholding whilst still taking advantage of what Google Signals has to offer from a remarketing perspective. If you are utilising more advanced features of Google Analytics 4 such as user id, then the device based identity will not use user id in user calculations. However, this approach provides you a workable trade-off vs the negative impacts of data thresholding.
  • If your Google Analytics 4 configuration has Google Signals enabled, you are using Google Signals for remarketing purposes, and have the BigQuery data export configured then you have an additional solution to consider. As Google Signals data is not part of the BigQuery dataset, it means you could instead rebuild your reports in BigQuery and return values that are not subject to data thresholding. This would require a significant shift away from using the Google Analytics 4 interface for reporting, to exclusively building all your Google Analytics 4 reporting needs from BigQuery. This would require new skills and expertise within your team and is something that should be considered very carefully.

How do you know if a report in Google Analytics 4 is subject to data thresholding?

When running either a pre-built report or exploration in Google Analytics 4, you can check whether your report is subject to data thresholding.

In the above example, you can see that thresholding has been applied to the report. In other words, any underlying data that could have inferred the identity of a user has been hidden, and the results that have been returned are based on a subset of data.

What is the impact of data thresholding?

Google Analytics 4 reports that are subject to data thresholding will be undercounted/under represented. This is because the data that could infer the individual identity of a user is hidden from the reporting output.

The hidden data is not replaced or modelled, meaning that the output of your report is based on a subset of users.

Google Analytics 4, at the time of writing this article, does not share or report on the impact of thresholding within the Google Analytics 4 interface (i.e. we don’t know how much data has been hidden) which makes it difficult to quantify the impact of data thresholding when analysing data at face value.

What steps should you take if you are seeing data thresholding in Google Analytics 4 reports?

If you are experiencing data thresholding in Google Analytics 4, follow any of the steps below to mitigate the impact.

  • Check the date range of your Google Analytics 4 report. Sometimes, data thresholding could be triggered due to a narrow date range triggering low user or event counts. Expanding your date range may help to overcome data thresholding.
  • Consider disabling Google Signals in Google Analytics 4. If you have Google Signals enabled, but are not using this data for remarketing purposes, then the benefits of the integration are small compared to the positive impact of negating data thresholding from disabling this feature.
  • Switch to device based reporting identity. If you have Google Signals configured within Google Analytics 4, and are using this data for remarketing, then your Google Analytics 4 reports have a higher likelihood of experiencing data thresholding and are likely experiencing the issues outlined in this article. By switching to a device based reporting identity, you can negate the impact of data thresholding whilst still using Google Signals data for remarketing purposes, giving you a practical and pragmatic solution to have the best of all worlds.
  • Utilise Google BigQuery. If switching to a device based reporting identity isn’t suitable for you (e.g. you are using user id in Google Analytics 4 and want this in your user reporting calculations), and you have the BigQuery export in place, then you have an additional solution of recreating your Google Analytics 4 reports in BigQuery. Google Signals data is not exported to BigQuery, therefore your BigQuery dataset will not be subject to data thresholding and you will get a more meaningful data import. This requires a big shift in reporting, and requires a new skill set in your team, so should be considered carefully.
  • Speak to us. If you have any further questions on data thresholding, please get in contact.