Date posted

20 Mar 2023

Read Time

Related Articles

Analytics

Creating calculated metrics in Google Analytics 4

Google Analytics 4

What is data thresholding in Google Analytics 4?

Google Analytics 4

The five measurement myths of Google Analytics 4

With the full roll-out of Google Analytics 4 fast approaching, it is becoming a race against time to ensure that Google Analytics 4 properties are tracking measurement reliably. 

Your business, understandably, needs confidence in using Google Analytics 4 data to optimise marketing activity, creative assets, and the on-site user experience. 

This confidence can easily be eroded when you spot anomalies in your Google Analytics 4 data, versus your Universal Analytics dataset that you have been using and trusting for a long time to make marketing decisions.

For example, you may have noticed that your Google Analytics 4 configuration is recording user, session or conversion data that is either higher or lower than an acceptable or perceived tolerance. 

With fundamental changes in the Google Analytics 4 data model, comparing the output of your Google Analytics 4 dataset vs Universal Analytics is not an apples to apples comparison. The changes in how data is collected and reported in Google Analytics 4 can have a significant impact on how you use, interpret and analyse your data.

This article seeks to debunk five popular measurement myths of Google Analytics 4, and highlights why an apples to apples comparison against Universal Analytics is not recommended:

Myth 1: The five measurement myths of Google Analytics 4

Myth 2: The sessions metric is comparable across Universal Analytics and Google Analytics 4

Myth 3: The Google Analytics 4 interface reports on actual user and session counts

Myth 4: Conversion volumes are comparable across Universal Analytics and Google Analytics 4

Myth 5: Querying the same data in the Google Analytics 4 interface and Google BigQuery will return the same results

You can also watch this content, created for MeasureSummit 2023, on-demand below:

Myth 1: The users metric is comparable across Universal Analytics and Google Analytics 4

An apples to apples comparison of the users metric across Google Analytics 4 and Universal Analytics is not recommended due to the following measurement nuances:

Google Analytics 4 reports on active users

Google Analytics 4 has a brand new ‘active users’ metric. An active user is defined as a user who has either:

  • Had an engaged session: An engaged session is defined as a session that has either lasted 10 seconds or longer, or has had two or more page or screen views, or has triggered at least one conversion event.

OR

  • The first_visit or first_open event is collected: These events are also used to determine the new users metrics.

Put simply, the active users metric does not include non-engaged returning users.

When viewing standard (i.e. pre-built) reports in the Google Analytics 4 interface you will see the users metric, and it would be logical to assume this is showing total users. However, this metric is not showing total users, instead it is showing active users.

Therefore, user counts in standard Google Analytics 4 reports may appear lower than anticipated, as active users do not include non-engaged returning users. The magnitude of difference between active users and total users will depend on how frequently your users return to the site, and how engaged those returning users are.

The total users metric is only available in the explorations module of Google Analytics 4. Therefore, be careful of comparing user metrics across explorations and standard reports, ensuring any comparisons are like-for-like using the active users metric.

Reporting identity affects user count

In addition to the nuances around active users and total users as described above, the reporting identity that is used within your Google Analytics 4 configuration will impact how Google Analytics 4 calculates users.

By way of quick reminder, there are three reporting identity options in Google Analytics 4.

  • Blended: This is the default reporting identity when you configure a Google Analytics 4 property. This combines observed and modelled data.
  • Observed: This is similar to blended but does not include modelled data.
  • Device based: This is based on the device-id method and is most similar to how Universal Analytics calculates users.

If you are using the blended or observed reporting identities then you will expect differences in how users are calculated as the methodology is different compared to Universal Analytics. Even if you are using the device based identity, you will experience differences due to standard reports in Google Analytics 4 now showing active users and not total users.

Myth 2: The sessions metric is comparable across Universal Analytics and Google Analytics 4

It is not only the users metric where there are nuances in measurement to consider. 

An apples to apples comparison of the sessions metric across Google Analytics 4 and Universal Analytics is not recommended due to the following measurement nuances:

Session count does not increase when a new campaign is detected

In Google Analytics 4, a change of source medium or campaign mid session does not start a brand new session. It is now possible to have multiple source/medium/campaign combinations associated with a single session in Google Analytics 4.

This is significant, as in Universal Analytics such a change would start a brand new session. This is particularly problematic where utm tracking is incorrectly used on internal site links, as it means session counts will increment every time a utm tracked link is clicked, overwriting the source, medium and campaign each time.

Whilst this is no longer an issue with Google Analytics 4, the advice remains not to use utm tracking on internal site links. Instead, use the event based data model in Google Analytics 4 to your advantage by configuring events for when users click on your internal site links.

The impact of this change is that session counts may appear lower in Google Analytics 4, particularly if your use of utm tracking to record internal site links is high.

Late hits are processed in a longer timeframe

A late hit is defined as a hit that is not sent immediately. For example, a user may be browsing your website on a mobile device and lose mobile service. They then revisit your site again at a later point in time.

In Universal Analytics, a late hit is only processed if within a four hour timeframe. So, in the example above, if the user revisited the site within four hours of losing mobile service they would be processed as a late hit. If this timeframe was beyond four hours, the late hit would not be processed or recorded within your Universal Analytics data.

In Google Analytics 4, the window for processing a late hit has extended from 4 hours to 72 hours. 

Consequently, you are more likely to see variations in your reported session figures due to this wider processing window. For example if you were analysing session counts for the previous week, you may observe a different session count if you run the same report today and yesterday, as a result of any late hits being processed.

Because Google Analytics 4 is processing late hits from a wider timeframe session counts may be higher in Google Analytics 4.

Session count does not increase when a users session straddles two days

Let’s take the example of a user landing on your website at 23:55 and leaving your website at 00:05.

In Universal Analytics, if a user is on the website when midnight arrives, a new session is started. In other words, the above example would be recorded as two sessions.

In Google Analytics 4, a new session is not started when midnight arrives. In other words, the above example would be recorded as a one session.

The impact of this change is that session counts may be lower than anticipated in Google Analytics 4.

Myth 3: The Google Analytics 4 interface reports on actual user and session counts

User and session counts are estimated in the Google Analytics 4 interface, using a statistical algorithm named HyperLogLog++ (also known as HLL++).

Measuring exact distinct user and session counts for large datasets requires significant memory usage, and the processing of this information can affect the speed and performance of the reporting in the Google Analytics 4 interface and associated APIs. 

In order to balance the impact of processing vs reporting performance, HLL++ is used in Google Analytics 4 to estimate user and session counts within the interface and the data API. This ensures better reporting performance, as less memory is used, whilst providing user and session counts with higher estimation accuracy and lower margins of error.

Therefore, Google Analytics 4 can return differences in user and session data. The difference could be higher or lower depending on the outcome of the HLL++ algorithm for your underlying dataset.

Myth 4: Conversion volumes are comparable across Universal Analytics and Google Analytics 4

It is not only user and session data where underlying methodologies have changed.

An apples to apples comparison of conversion volumes across Google Analytics 4 and Universal Analytics is not recommended due to the following measurement nuances:

Conversions in a single session are no longer deduplicated

A popular way of capturing conversions in Universal Analytics was via the use of goals. Goals no longer exist in Google Analytics 4, which has an important bearing on how conversion volumes are counted and reported.

Goals work in a unique way in Universal Analytics, in that multiple counts of a goal in a single session were deduplicated. For example, if a goal is configured for a user reaching a login page, and this page is viewed three times in a single session, Universal Analytics would record this as one goal completion and not three. 

This deduplication does not happen within Google Analytics 4 (in Google Analytics 4 the above would be recorded as three conversions), because goals do not exist. Furthermore metrics such as unique events or unique pageviews, which are typically deduplicated at a session level, also do not exist in the Google Analytics 4 schema meaning no easy alternative way to perform the deduplication.

The impact of this change is that conversion volumes are likely to be showing as larger in Google Analytics 4, as they are not being deduplicated at a session level. This difference could be notable if multiple actions of your goals are regularly occurring within a single session.

Smart goals are not supported in Google Analytics 4

Smart goals in Universal Analytics use machine learning to examine signals about website sessions that indicate the likelihood of a conversion. Google Ads campaigns can then be optimised based on these signals.

If you are using smart goals in Universal Analytics, then please be aware that they are no longer supported in Google Analytics 4. Utilise new features such as predictive audiences to predict the future conversion probability of your website users.

Google Analytics 4 uses a data-driven attribution model

Google Analytics 4 uses data-driven attribution as its default out-of-the-box attribution model. This is a shift from Universal Analytics which used a single touch last non-direct click model.

With Google Analytics 4 using a more sophisticated attribution model, that is based on multiple touchpoints, you should expect to see differences in the number of conversions being reported at a source, medium or campaign level. Volumes could be higher or lower depending on your user’s journey/path to conversion.

Furthermore, because a multiple touchpoint attribution model is being used, it will mean that in Google Analytics 4 it will be normal to see decimalised or fractions of conversions showing at a source, medium or campaign level.

Attribution is a complex concept in Google Analytics 4. We have produced an in-depth guide on attribution modelling in Google Analytics 4 for more information.

Myth 5: Querying the same data in the Google Analytics 4 interface and Google BigQuery will return the same results

One of the fantastic additions of Google Analytics 4 is democratising the Google BigQuery daily export. With this functionality now being available to all, there will be many instances of marketers running reports and advanced analysis in both the Google Analytics 4 interface and within Google BigQuery. A natural expectation being that comparable report queries will return the same results across each dataset.

However, please be aware that querying the same data in the Google Analytics 4 interface and Google BigQuery has the potential to return different and conflicting results. 

Effective querying of Google Analytics 4 data in Google BigQuery will require both technical expertise in writing complex SQL, and also deep analytical expertise in understanding the structure and nuances of the Google Analytics 4 dataset. It will require more traditional web analysts to upskill in advanced querying via SQL, or will require technical specialists with SQL skills to learn the concepts of the underlying marketing data in Google Analytics 4.  

Some examples of where comparable reporting queries can provide differences are shown below:

Active users vs total users

Running a straightforward SQL query to produce a count of user_pseudo_ids in Google BigQuery will not return the same count as the users metric in the Google Analytics 4 interface. This is because the users metrics in Google Analytics 4 standard (i.e. pre-built) reports are based on active users as opposed to total users.

Calculating active users in Google BigQuery is possible, but requires a more sophisticated SQL query to do so. There is some excellent content and resources on the ga4bigquery.com website which explains how this can be approached. This is one use case where advanced SQL skills are required to fully utilise the Google Analytics 4 and Google BigQuery integration.

Data estimation

As referenced earlier in this article, user and session data in the Google Analytics 4 interface is estimated using the HyperLogLog++ algorithm (HLL++) to ensure an acceptable trade-off of memory usage and reporting processing/speed.

When running queries using the Google BigQuery dataset, you are accessing the raw data directly which is not subject to estimation via the HLL++ algorithm. Therefore, it is expected that seemingly comparable data outputs from the Google Analytics 4 interface and Google BigQuery will not match.

Source/medium querying

The Google Analytics 4 data export to Google BigQuery lacks source/medium data at the session level. 

Source/medium is available at a user and event level. To analyse source/medium data at the session level in Google BigQuery requires complex (and potentially expensive) querying of the event level data. It is another example where advanced SQL skills are required to fully utilise the Google Analytics 4 and Google BigQuery integration.

There are excellent resources on this topic on tanelytics.com and ga4bigquery.com both of which are a highly recommended read if you are trying to perform this type of analysis in Google BigQuery, and are not sure where to start.

Google Signals data

Within your Google Analytics 4 configuration you may have Google Signals data enabled as part of your remarketing strategy. 

Google Signals also unifies users across different devices for users signed-in to Google. For example a user signed-in to Google who visited your website on a desktop and mobile would be unified as a count of one user. If that same user visited your website, but was not signed-in to Google, they would have been classed as two users.

Google Signals is not included within the data exported from Google Analytics 4 to Google BigQuery. Therefore you may expect higher user counts to be returned from analysis conducted in Google BigQuery, as this dataset cannot unify cross-device visits from Google signed-in users.

It is important that any analysis or reporting of data that crosses both the Google Analytics 4 interface and the Google BigQuery dataset is treated carefully and with caution. Even when running more advanced queries to align data, there can still be genuine differences across the two datasets as described above.

In summary

Developing confidence in the integrity and reliability of the Google Analytics 4 dataset will be high on the agenda of most marketers and analysts right now. 

Observing differences in Google Analytics 4 data vs the Universal Analytics dataset will likely cause questions and confusion which affects confidence in using Google Analytics 4 data to make marketing decisions.

Given the fundamental differences in the data model and technology used in Google Analytics 4, there are a number of instances where there can be legitimate differences in data, which this article has hopefully helped to provide insight on. This could result in your user, session or conversion data returning higher or lower values than you expected.

The key takeaway to land from this article is that Universal Analytics vs Google Analytics 4 is not an apples to apples comparison and should not be treated as so.

This increases the importance of ensuring that your Google Analytics 4 configuration is built on robust measurement foundations and principles.  If you have any further questions on the measurement myths in this article, or if you would like further advice on the integrity and robustness of your Google Analytics 4 configuration, please get in contact with us.