Showing posts with label analytics. Show all posts
Showing posts with label analytics. Show all posts

Feb 12, 2024

Understanding the gcd parameter in GA4 network requests

I have a problem: I don't remember things :) That's why my Google Calendar is full of memos even for personal stuff I'm supposed to remember. 

Given this premise, you can probably imagine how much I hate the new gcd parameter, the parameter associated with Consent Mode v2 - populated with consent for the new signals ad_user_data and ad_personalization

The gcd parameter in GA4 network requests

For those who don't know what I'm talking about, I strongly recommend having a look at this guide on the Simo Ahava's blog authored by Markus Baersch, to get a deeper understanding of Consent Mode v2 and the gcd parameter. 

For the sake of this post, I will summarize here the main components of this parameter. The value passed in the gcd parameter consists of: 

  • first two integers, e.g. 11
  • followed by a letter to represent the status of the ad_storage consent signal, for instance p
  • followed by another integer separating the above consent signal from the next one, e.g. 1
  • a letter to represent the status of the analytics_storage consent signal, e.g. another p
  • integer, e.g. 1
  • letter - e.g. p - for the ad_user_data consent signal
  • integer, 1
  • letter - p - for the ad_personalization consent signal
  • finally, another integer to close the string, for instance 5.

Taking the above example, the value passed in the gcd parameter would be 11p1p1p1p5

I don't know what people think about this parameter but I find it a bit confusing and the meaning of the letters representing consent signals is hard to remember since they are not limited to the p I 've used in my example: There are 9 of them!

How beautiful is the gcs parameter instead

The good old gcs parameter is instead straightforward in my opinion since it uses binary numbers to represent the two main consent signals ad_storage and analytics_storageIt's dead simple:
  • G1 is always there, thus can be ignored
  • then there's the ad_storage consent signal, either 1 if it's been granted or 0 if not
  • finally there's the analytics_storage consent signal, again 1 or 0 wheter consent has been given or not.
How simple is that? I immediately read that, for instance, G101 means analytics_storage is granted while ad_storage is not, and so on for any other possible combination.

The trick I found to remember how to read the gcd parameter 

Serendipity, as usual, has shown me the way again this time. I was playing with a table taken from the guide I mentioned on Simo's blog, when sorting the rows on Google Sheets I realized there was a pattern.

The table taken from Simo's blog (credits)

A simple alphabetical sorting in the Letter column unveils there are 3 letter groups: 

  1. the 1st letter group (l, m, n) represents consent signals where the default command is not recorded altogether
  2. the 2nd one (p, q, r) represents consent signals where the default command sets to denied
  3. the 3rd one (t, u, v) represents consent signals where the default command sets to granted.
Then, each letter of each letter group represents the consent signal after the update command has been executed. Therefore:
  1. letters in the 1st position (l, p, t) represent consent signals where the update command is missing 
  2. letters in the 2nd position (m, q, u) represent consent signals where the update command sets to denied
  3. letters in the 3rd position (n, r, vrepresent consent signals where the update command sets to granted.

I know this can still be a bit confusing so the table below is worth a thousand words.

The table that helped me understand how to read the gcd parameter


I guess there are other people who already found that pattern and are using it to read the gcd parameter. Some other people are instead ignoring this topic altogether. 

However, I hope there's someone else who will find this tip useful for the day by day debugging.

Jan 22, 2024

Validate how UTM tags are assigned to default channel groups in GA4

Last summer I've attended the GA4 Summit event organized by Tag Manager Italia, here in Italy. In particular I remember a talk by Steen Rasmussen titled "Campaign tracking and attribution in GA4 - Tips, tricks and tactics", an inspiring one - as always by Steen, as far as I can tell - about the new importance of campaign tracking caused by the transition to GA4.

Starting from there, in the last few months I focused my attention on campaign tracking, especially how to tag campaigns and how they are assigned to GA4 default channel groups. Some tools came in handy, for instance the by Steve Lamar, but then taking the course Query GA4 Data In Google BigQuery by Johan van de Werken and Simo Ahava I realized there were chances to programatically optimize the validation of UTM tags lists which is quite an expensive task.

Here's why I decided to take a step forward using Python and Colab to create a script with this purpose in mind. This post is about how the script I created works and how to use it, for anyone interested. 

How to use the GA4 UTM tags classifier

For those experienced in Python who want to dig deeper how the script works please jump to the next section of this post.
For those scared by the code, take a breath, you can simply ignore it since Colab will do everything for you. Please keep reading this section since it includes a step by step guide on how to use my script.
For both, the script is available at this link.

1. Open the script in Colab 

In the Github page I provided above there's a blue button "Open in Colab": That's the recommended way to rapidly use the script without installing Python or other boring stuff :)

2. Sign-in with a Google account

Then click the folder icon on the left side panel of Colab. After a while you will see a folder named "sample_data" appearing but you can ignore it since we need to upload our own files there.

Clicking the folder icon in Colab to add files

3. Upload the files to process

We need two files to make the script work.
  • The 1st one is a list of GA4 source categories, the one Google references in its documentation, available at this link. Unfortunately it's a PDF, which is crazy in my opinion considering it's even mentioned as a spreadsheet by Google itself. This is why I copy/pasted the content of the PDF in a real spreadsheet that I'm making available at this link (safely stored on Google Drive).
  • The 2nd file is another spreadsheet with the list of source, medium and campaign parameters to validate. This must have three columns named exactly "Source", "Medium" and "Campaign".
Both the files have to be named exactly as I did. Therefore, the 1st one must be named "NK_AnalyticsFTW_GA4SourceCategories.xslx", the 2nd one "Check.xslx".

4. Run all code cells in the Colab notebook

To do so, you have to click Runtime / Run all found at the top navigation bar in Colab.

Clicking Run All in the Colab notebook to execute the script

5. Get the results file and enjoy! 

Once the script ends its execution you will find another file inside the Files section of Colab. Just download the file and start your assessment in Excel or any spreadsheet software of your choice.
If you don't see a file named "results.xslx" immediately, just hit the Refresh button as in the snapshot below.

Clicking the refresh button in Colab to update the Files section

Explanation of the GA4 UTM tags classifier, for coders

In this section I will explain the script almost line by line in order to understand how it works. This is especially important for experienced coders who want to help reviewing my code or just customizing it.

If you're wondering, I don't think I'm a good Python coder so let me precise most of the code has been created using ChatGPT and then I added something here and there to make it work as I wanted.

Disclaimer: Google Blogger does not allow me to paste code blocks easily so the excerpts you're going to find here are just meant as references. 

1. Import python libraries

We just need a couple of simple libraries for this script to work:
  • pandas manages data frames
  • re is the library we need to process the RegEx Google uses to interpret some parameters.

2. Import GA4 source categories file

Here we load the file with the list of source categories taken from Google documentation. For those jumping right here, as I previously wrote, I copy/pasted the content of the PDF in a spreadsheet I'm making available at this link (safely stored on Google Drive).

# Load the uploaded Excel file
file_path = '/content/NK_AnalyticsFTW_GA4SourceCategories.xlsx'

# Assuming the values are in the first column of the first sheet, we'll read that column
values_df = pd.read_excel(file_path)

After reading the file, the cell code filters rows creating the following sublists: SOURCE_CATEGORY_SHOPPINGSOURCE_CATEGORY_SEARCHSOURCE_CATEGORY_SOCIAL and SOURCE_CATEGORY_VIDEO.  

# Filtering the values from the Excel file where 'source category' is 'SOURCE_CATEGORY_SHOPPING'
filtered_shopping_sources = values_df[values_df['source category'] == 'SOURCE_CATEGORY_SHOPPING']['source'].tolist()

3. Upload the file to check

The next cell reads the file with the list of source, medium, campaign parameters we want to assess. 

uploaded_file_path = '/content/Check.xlsx'
df = pd.read_excel(uploaded_file_path, na_filter=False)

4. Map the input UTM tags

Finally, the last cell in the notebook is the main part of the script. 

The first part gives a name to any possible RegEx patterns to use afterwards. I've decided to keep each possible pattern distinct even if they are actually duplicates of others since I wanted to be  prepared for any possible future change.

# Define the regex patterns
# Some of them are duplicates but they could change in the future
paid_shopping_campaign_pattern = r'^(.*(([^a-df-z]|^)shop|shopping).*)$'

Then, comes the core function check_condition.

The first rows of the function take the columns of the Excel to convert their content in lower cases. I have to say thanks to Luka Cempre about this part since looking at the SQL code he posted on Linkedin I noticed my code was missing it.

Source = row['Source'].lower()
Medium = row['Medium'].lower()
Campaign = row['Campaign'].lower()

An if, elif, else statement is in charge of assigning the parameters to GA4 default channel groups. The order of the lines is the same that can be found in the table "Channels for manual traffic" in Google Analytics Help.

if Source == '(direct)' and Medium in ['(not set)', '(none)']:
return 'Direct'

elif Campaign == 'cross-network':
return 'Cross-network'

elif Source in filtered_shopping_sources or (re.match(paid_shopping_campaign_pattern, Campaign) and re.match(paid_shopping_medium_pattern, Medium)):
return 'Paid Shopping'


In the code above, I extracted some lines from the function to show how a Direct, Cross-network and Paid Shopping channel groups are assigned to source, medium and campaign parameters. Both the RegEx patterns showed earlier in this section and values with lower case kick in here to make the code more readable.

Finally, this code cell applies the Python function to the rows to check and the data frame is exported to an Excel file named "results.xslx".

# Apply the function to each row
df['Default channel group'] = df.apply(check_condition, axis=1)

# Display the DataFrame

# Export the DataFrame
df.to_excel("results.xlsx", sheet_name='UTM tagging classification', index=False)

Final thoughts

I hope the script I created will help classifying UTM parameters into GA4 channel groups as smoothly as possible!  
I find Google documentation to be hard to understand for average marketers and all the steps required to do so are too time-consuming. 

Hopefully, Google will change something in the future, but for now, my little script can be one of the ways to go.

Nov 5, 2023

Custom Event named as an Automatic Event will duplicate events in GA4

Some days ago I found a question on Reddit asking what happens when Custom Event and GTM GA4 Event have the same event_name? I wanted to be 100% sure about the answer so I decided to do a pretty simple test I will show in the next lines.

The most complete guide I’ve found about duplicate events in Google Analytics 4 is of course something by Julius:

Unfortunately, that wasn’t enough for what I wanted to prove, so I hope I can help someone sharing how I tested it, precisely tackling the question.

How I set GTM to duplicate an automatic event

First things first, I set a GA4 event tag on Google Tag Manager replicating an automatic event. I’ve chosen the GA4 event page_view, but it could have been anything else.

Triggers were exactly the same to be sure they fired at the same time (see snapshot below).

The Google Tag - sending a page_view - and the parallel custom GA4 page_view event I set](  The Google Tag - sending a page_view - and the parallel custom GA4 page_view event I set
The Google Tag - sending a page_view - and the parallel custom GA4 page_view event I set

The GA4 event tag was set by adding just a custom parameter to let me easily recognize what I was looking at in the DebugView.

The GA4 event tag is set with a custom parameter foo with value bar
The GA4 event tag is set with a custom parameter foo with value bar

Results: Will both automatic and custom GA4 events fire?

I haven’t published my new version of the GTM container but simply used the Google Tag Assistant to test the results in the GA4 DebugView.

The results were straightforward. Both the automatic page_view and the custom page_view were fired. The first page_view event was the automatic one while the second contained the custom parameter foo.

page_view automatic event automatically created by GA4
page_view automatic event automatically created by GA4

page_view custom event with custom parameter foo
page_view custom event with custom parameter foo

Here we go, a complete test to successfully prove what happens when automatic events get duplicated with custom ones. There’s no deduplication on Google’s side, so look out!

Aug 4, 2023

How to convert the Google Tag Manager JSON to a spreadsheet overcoming Excel limits with KNIME

Most of the time you can easily convert a Google Tag Manager JSON file to an Excel one with some free tools.

The best tool of this kind is probably the Google Sheets extension “GTM Tools” by the almighty Simo Ahava.

Unfortunately, the free superpowers of this add-on can encounter some bitter moments where things go wrong, as well as any other tool.

The issue with Google Sheets characters limit in a single cell

I was trying to wrap my head around a GTM setup so I decided to use GTM Tools to simplify the overview of tags and triggers.

During the processing phase of the GTM container assessed, I got a strange error message breaking the workflow.

The error message raised by GTM Tools Google Sheets extension

The issue is that one of the tags was more than 50k characters long - I know that sounds crazy - which is the maximum limit of chars available in Google Sheets!

KNIME to the rescue… maybe

I’ve used KNIME in the past to automate some tasks. One of the most interesting stuff I published in KNIME Hub was a simple workflow taking a GTM JSON file and converting it to an Excel one.

I must be honest, I’ve usually preferred GTM Tools since it’s a bit more hassle-free, a couple of clicks and you’re there. In this case, I thought my little KNIME application using Excel instead of Google Sheets could have made my day.

Unfortunately that was not the case since Microsoft Excel has got its own limit of 32k characters which is even less than the Google Sheets one!

The error message raised by the KNIME Excel Writer node 😟

How I made it work

The easiest way I found to make it work, was to edit the KNIME application looking for a way to split into multiple columns the cell content related to the infamous tag, before the Excel Writer node was run.

Here below is how the workflow appears after adding a couple more KNIME nodes: The Cell Splitter By Position node - in charge of splitting the content - and the Column Filter node - used to remove the original column with the extra large content to split.

The final KNIME workflow with the Cell Splitter By Position and the Column Filter nodes in the 3rd branch

In particular, to make it work, I set 4 split indices to split the content when it reaches 30k, 60k, 90k or 120k characters.

For each split the node will create a new column. The original column coming from the GTM JSON was named Details so I decided to just name them Details1, Details2 and so on, Details5 being the last one at the 4th split.

Configuration of the Cell Splitter By Position node

The table previews generated by KNIME are quite straightforward about the results of the node.

The table preview before applying the Cell Splitter By Position node
The table preview after applying the Cell Splitter By Position node (we will remove the original Details column with the next node)

And that worked like a charm. Now my GTM JSON file perfectly fits a simple Excel file!

Feb 21, 2023

Summary of Google Analytics 4 Certification program on Google Skillshop

This post comes from the notes I wrote down before taking the Google Analytics 4 Certification. 

To be honest I think this time the Google Analytics material is way too long and verbose than how it was in the Google Analytics Academy for the previous certification.

I believe this post is a bit too long as well, however it could be helpful for those who want to take the Google Analytics 4 Certification and would like a summary of the material provided by Google on its Skillshop website. This is the reason why I'm sharing it here exactly as I wrote it for me.

I strongly recommend copy/pasting this summary in a Google Docs/Microsoft Word file and use the headings to reshape the text structure as desired.



Jan 26, 2023

How to tie sessions to transactions tracked via Measurement Protocol for GA4

Today, I'm going to show a simple yet powerful experiment I've made with Google Analytics Measurement Protocol. 

I was looking for ways to do this test with Python or R but the Google Analytics 4 Event Builder proved to be more than enough. 

Where it all started

I'm working with a client who needs sending purchase events to Google Analytics through Measurement Protocol (long story short). 

After a first implementation we've found everything was correct but purchase events were not tied to any information about the channel driving them.  

Transactions are coming from Unassigned

God bless Measure Slack

There are no words to describe how Measure Slack helps with these issues: It's awesome! 

In this particular case I wasn't actively looking for an answer to my problem but, all of a sudden, I've found one person with the same exact issue I had, WOW.

The first reply to his question was the answer to my issue!

First reply in Measure Slack pointing towards Google documentation

As a matter of fact, in its documentation at this link, Google recommends:
In order for user activity to display in standard reports like Realtime, engagement_time_msec and session_id must be supplied as part of the params for an event.

Time to test with the GA4 Event Builder

I trust people in Measure Slack but I needed to test things out to be sure there were no drawbacks and to be confident passing this information to the developer team. 

My plan was to send events through the Event Builder with and without the engagement_time_msec and session_id parameters to see what could happen. 

Here follows an example payload I've made using the GA4 Event Builder (with those two parameters populated): 

An example payload with engagement_time_msec and session_id parameters

To populate those fields two things to keep in mind:
  1. engagement_time_msec can be anything so I decided to put a simple 1 in there (UPDATE: Simo actually tested and reported in the same Slack thread that this parameter is probably not needed at all)
  2. session_id should instead be something real so I had to take this information from a debugging session with Google Analytics Debugger, the Chrome extension available here.
Where to find the session_id in Google Analytics DebugView


I tested all the transactions have been sent to GA4 with a simple check on the Real Time report 

transaction_ids tracked in  the Real Time report

Then I waited some hours to dive deeper on the transactions tracked. The next day I've found sessionization have worked astonishingly well.

Session default channel groups with right attribution of transactions

I've actually sent 3 purchase events without engagement_time_msec and session_id and just one with both of them. 

Indeed, 3 purchase events were not sessionized so got attributed to Unassigned, the one with both additional parameters was associated to Direct correctly.