<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BigQuery Archives | Informed Iteration</title>
	<atom:link href="https://informediteration.com/tag/bigquery/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>More Value and Less Stress From Your Data</description>
	<lastBuildDate>Wed, 01 Apr 2026 18:24:51 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://informediteration.com/wp-content/uploads/2018/12/cropped-logo-1-32x32.png</url>
	<title>BigQuery Archives | Informed Iteration</title>
	<link></link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Complying with Strict ActAs Permission Enforcement in Dataform</title>
		<link>https://informediteration.com/complying-with-strict-actas-permission-enforcement-in-dataform/</link>
					<comments>https://informediteration.com/complying-with-strict-actas-permission-enforcement-in-dataform/#respond</comments>
		
		<dc:creator><![CDATA[JF Amprimoz]]></dc:creator>
		<pubDate>Mon, 12 Jan 2026 17:11:50 +0000</pubDate>
				<category><![CDATA[GCP - BigQuery - Dataform]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Dataform]]></category>
		<guid isPermaLink="false">https://informediteration.com/?p=1118</guid>

					<description><![CDATA[<p>Depending on how carefully you read automated emails from Google Cloud Platform, you may be preparing for the change to Strict Act-As enforcement in Dataform, or you might be looking to resolve an issue caused by it. Depending on when you read this post, you might also just want to know how to set up [&#8230;]</p>
<p>The post <a href="https://informediteration.com/complying-with-strict-actas-permission-enforcement-in-dataform/">Complying with Strict ActAs Permission Enforcement in Dataform</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Depending on how carefully you read automated emails from Google Cloud Platform, you may be preparing for the change to Strict Act-As enforcement in Dataform, or you might be looking to resolve an issue caused by it. Depending on when you read this post, you might also just want to know how to set up your Dataform Releases and Workflows in a way that complies with the Act-As requirements.</p>



<h2 class="wp-block-heading">What Is Strict Act-As Enforcement in Dataform GCP, and What Is the Impact</h2>



<p>Strict Act-As is an extra security check that used to be optional. It prevents Workflows from running with the default Dataform Service Agent, which forces GCP users to be more deliberate with how they set up who gets to create or make changes to which Dataform Workflows.</p>



<figure class="wp-block-pullquote"><blockquote><p> <mark style="background-color:#d58080" class="has-inline-color has-black-color">“Existing Dataform, BigQuery Notebook, BigQuery Pipelines, and BigQuery data preparation workflows using the Dataform service agent will stop running.”</mark></p><cite>Google Email to Dataform Users</cite></blockquote></figure>



<p>Instead, you need to create a Custom Service Account and configure it to run your Workflows. We’ll go through that step by step below.</p>



<h3 class="wp-block-heading">Default Dataform Service Agent VS Dataform Custom Service Account</h3>



<p>Note that we will use the terms Default Dataform Service Agent and Custom Service Account a lot. The Agent is the principal Dataform creates to run its jobs. The Custom Account is the principal we create to make sure that the Agent is authorized to run the workflow.</p>



<p>As a very loose analogy, think of:</p>



<ul class="wp-block-list">
<li>the Workflow as a building project</li>



<li>the Default Agent as a tradesperson</li>



<li>the Custom Account as the project manager</li>



<li>Google as the construction company</li>
</ul>



<p>Traditionally, Google would let tradespeople (Agents) just walk onto the building site (Workflow) and get to work, but they want to tighten security. As a result, they are introducing a policy that only project managers (Service Accounts) on that building (Workflow) have keys to the site and the authority to request that the tradespeople (Agents) do work.</p>



<p>For all our building projects (Workflows) where the tradespeople (Default Dataform Agent) are still doing stuff directly, we need to hire a project manager (Custom Service Account), and tell the tradespeople to report to them.</p>



<h2 class="wp-block-heading">Key Dates</h2>



<h3 class="wp-block-heading">2026-01-19</h3>



<p>New repositories created on or after this date will no longer have the option of not using Strict Act-As enforcement, which means you won’t be able to make Workflows run by the Dataform Service Agent.</p>



<h3 class="wp-block-heading">2026-04-29 &#8211; 2026-07-31</h3>



<p>Existing repositories will gradually have Strict Act-As enforcement turned on. If you haven’t set things up correctly by then, your scheduled Workflow runs will stop.</p>



<h2 class="wp-block-heading">Create or Modify a Dataform Workflow to Work with Strict Act-As Enforcement</h2>



<p>Whether you are making a new Workflow or changing an existing one, the Act-As part of things is pretty much identical. We’ll</p>



<ol class="wp-block-list">
<li>Make the Custom Service Account </li>



<li>Give it required permissions</li>



<li>Give the Dataform Service Agent and appropriate users access to the Custom Service Account</li>



<li>Create or configure a Workflow to be run by the Custom Service Account</li>



<li>Turn on Act-As enforcement and test</li>
</ol>



<p>Even nicer is that the UI design anticipates that people would want to do steps 1-2-3 above together, so we don’t have to wander from page to page to do them. That assumes you are fine with doing things at the project level &#8211; you can get more or less granular, but we are keeping things simple for today.</p>



<h3 class="wp-block-heading">The Default Dataform Service Agent</h3>



<p>The instructions assume that you have already <a href="https://docs.cloud.google.com/dataform/docs/create-repository">created a Dataform repo</a>, and by extension, a Dataform Service Agent. The latter has the default format:</p>



<p>service-(aBunchOfNumbers)@<a href="http://gcp-dataform.iam.gserviceaccount.com">gcp-dataform.iam.gserviceaccount.com</a></p>



<p>If you changed the default, make a note of what you used instead, as we’ll need it in a bit.</p>



<h3 class="wp-block-heading">1. Create a Custom Service Account to Run Dataform Workflows</h3>



<figure class="wp-block-image size-large is-style-default"><a href="https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–.png"><img fetchpriority="high" decoding="async" width="1024" height="424" src="https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–-1024x424.png" alt="Creating a Service Account in Google Cloud Platform" class="wp-image-1126" srcset="https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–-1024x424.png 1024w, https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–-300x124.png 300w, https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–-768x318.png 768w, https://informediteration.com/wp-content/uploads/2026/01/Service-accounts-–-IAM-Admin-–.png 1140w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<ol class="wp-block-list">
<li>Search for “service”</li>



<li>Click on Service Accounts</li>



<li>On the Service Accounts screen, click on Create service account</li>
</ol>



<figure class="wp-block-image size-full"><a href="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-IAM-Admin-–-.png"><img decoding="async" width="1020" height="937" src="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-IAM-Admin-–-.png" alt="Naming a Service account for Dataform in BigQuery" class="wp-image-1125" srcset="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-IAM-Admin-–-.png 1020w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-IAM-Admin-–--300x276.png 300w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-IAM-Admin-–--768x706.png 768w" sizes="(max-width: 1020px) 100vw, 1020px" /></a></figure>



<p></p>



<p>Choose a name, provide a description, and change the autofilled ID if you want. Hit Create and Continue.</p>



<h3 class="wp-block-heading">2. Set Custom Service Account Permissions</h3>



<figure class="wp-block-image size-large"><a href="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions.png"><img decoding="async" width="961" height="1024" src="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions-961x1024.png" alt="Screencapture of Service Account Permissions to use Dataform" class="wp-image-1123" srcset="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions-961x1024.png 961w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions-282x300.png 282w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions-768x818.png 768w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–Permissions.png 1020w" sizes="(max-width: 961px) 100vw, 961px" /></a></figure>



<p></p>



<p>Always defer to your org’s security experts when it comes to setting cloud platform permissions. Some commonly provided permissions in this context are shown.</p>



<ol class="wp-block-list">
<li>BigQuery Job User &#8211; Dataform can’t run BQ queries without this one.</li>



<li>BigQuery Data Editor &#8211; Allows Dataform to read, modify, and create BQ tables and datasets, and delete tables.</li>



<li>BigQuery Data Viewer &#8211; Allows Dataform read-only access to tables (if granting permissions at the table or dataset level, it can be a good idea to limit permissions to Data Viewer on source tables you wouldn’t want to accidentally modify).</li>



<li>BigQuery Data Owner &#8211; Use this instead of Data Editor if you need Dataform to be able to delete entire datasets.</li>
</ol>



<p>Click Continue.</p>



<h3 class="wp-block-heading">3. Give Access to the Custom Service Account</h3>



<figure class="wp-block-image size-large"><a href="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access.png"><img loading="lazy" decoding="async" width="961" height="1024" src="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access-961x1024.png" alt="Give principles access to Dataform Service Account" class="wp-image-1124" srcset="https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access-961x1024.png 961w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access-282x300.png 282w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access-768x818.png 768w, https://informediteration.com/wp-content/uploads/2026/01/Create-service-account-–-Principals-with-access.png 1020w" sizes="auto, (max-width: 961px) 100vw, 961px" /></a></figure>



<p></p>



<p>The first principal we should give access to is the Dataform Service Agent that you created when you set up the Dataform Repo. Then we’ll give access to ourselves, and anyone else that needs to configure Releases or Workflows</p>



<ol class="wp-block-list">
<li>Type in part of the Agent ID email in the “Service account users role” and click on the result that pops up. The default, once again, looks like this:</li>
</ol>



<p>service-(aBunchOfNumbers)@<a href="http://gcp-dataform.iam.gserviceaccount.com">gcp-dataform.iam.gserviceaccount.com</a></p>



<ol start="2" class="wp-block-list">
<li>Add other users based on their Google Accounts. In most cases they should be granted the “Service account user” role , but if you want them to be able to make changes to who has access to the Service Account, or the Service Account itself, including deleting it, you would put their Google Account email in the “Service account admin” box.</li>



<li>Click Done</li>
</ol>



<p>That will return you to the list of Service Accounts. Click on the new Service Account and we’ll make sure everything worked as needed. Hit the Permissions tab and the Manage Access button to confirm the permissions you want are there.</p>



<h4 class="wp-block-heading">Provide Dataform Service Agent with the Token Creator Role</h4>



<p>We need to give one more permission to the default Service Agent on the Custom Service Account.</p>



<figure class="wp-block-image size-large"><a href="https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console.png"><img loading="lazy" decoding="async" width="1024" height="968" src="https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console-1024x968.png" alt="Screenshot of Dataform Service Account Principals with Access" class="wp-image-1129" srcset="https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console-1024x968.png 1024w, https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console-300x284.png 300w, https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console-768x726.png 768w, https://informediteration.com/wp-content/uploads/2026/01/dataform-runner-–-IAM-Admin-–Google-Cloud-console.png 1150w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p></p>



<ol class="wp-block-list">
<li>Go to the “Principals with access” tab</li>



<li>Check the “Include Google-provided role grants” box</li>



<li>The Service Agent has the Service Account User role, but they will also need the Service Account Token Creator role</li>



<li>Click the pencil to edit the Service Agent’s roles</li>
</ol>



<p>On the fly out, Click&nbsp; “+Add another role” and search for “Service Account Token Creator.” Click on it, then the Save button.</p>



<h3 class="wp-block-heading">4. Set Your Dataform Workflow to Use the Custom Service Account</h3>



<p>Whether you are just switching over existing workflows or creating new ones, there is only one field in the configuration flyout we need to worry about.</p>



<p>Now, it is probably a good idea to create a test Workspace (with some kind of compilation override), Branch, Release, and Workflow while getting the kinks out. We’re assuming you’ve already made the first three, as there is no change to those steps from Strict Act-as enforcement.</p>



<p>Assuming you are still in the repo you want to switch over,&nbsp; head to the “Releases &amp; scheduling” tab, and click “+ Create”&nbsp; to make a new Workflow.&nbsp;</p>



<figure class="wp-block-image size-full"><a href="https://informediteration.com/wp-content/uploads/2026/01/Releases-scheduling-–-Dataform-–-BigQuery-–-screen-6.png"><img loading="lazy" decoding="async" width="755" height="903" src="https://informediteration.com/wp-content/uploads/2026/01/Releases-scheduling-–-Dataform-–-BigQuery-–-screen-6.png" alt="Changing a Dataform Workflow to work with Strict ActAs" class="wp-image-1127" srcset="https://informediteration.com/wp-content/uploads/2026/01/Releases-scheduling-–-Dataform-–-BigQuery-–-screen-6.png 755w, https://informediteration.com/wp-content/uploads/2026/01/Releases-scheduling-–-Dataform-–-BigQuery-–-screen-6-251x300.png 251w" sizes="auto, (max-width: 755px) 100vw, 755px" /></a></figure>



<p></p>



<ol class="wp-block-list">
<li>Set it to use the test Release</li>



<li>Search for and/or choose the Custom Service Account you made earlier.&nbsp;</li>



<li>You might get a warning that isn’t very useful, like the one shown here, or you might get an error message with helpful information.</li>
</ol>



<p>When I was testing, I was blocked from using the Custom Service Account, and the error message told me it was because it needed access to a table in another project to run the workflow in question, making it easy for me to address the issue.&nbsp;</p>



<p>This digression is just to remind you to keep an eye out for anything like that, because the instructions above give the new Service Account permissions only on the currently selected project.</p>



<p>Resolve any errors, assess any warnings, save, and do a manual run. Then wait for a scheduled one to go through as well.</p>



<p>All green? Awesome. There’s one more thing we need to test though.</p>



<h3 class="wp-block-heading">5. Testing with Strict-Act As Enforcement Before the Deadlines</h3>



<p>To make sure your Workflows will run once Google turns on Strict Act-As enforcement, you may want to turn it on manually beforehand and test.&nbsp;</p>



<p>Note that you can only change this setting at the repository level. If you have multiple Workflows in your repo, any that you haven’t switched to the Custom Service Account will fail to run. I’d switch them all to the new Service Account before changing the repo-level setting. For now, you can switch Act-As enforcement back and forth on existing repos, but I’m not sure how Google will handle that after the 19th.</p>



<p>Once all your Workflows are ready,&nbsp;</p>



<figure class="wp-block-image size-large"><a href="https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1.png"><img loading="lazy" decoding="async" width="1024" height="646" src="https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-1024x646.png" alt="Screenshot Switch a Dataform repo to enforce ActAs permission checks." class="wp-image-1135" srcset="https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-1024x646.png 1024w, https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-300x189.png 300w, https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-768x484.png 768w, https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-1536x968.png 1536w, https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1-540x340.png 540w, https://informediteration.com/wp-content/uploads/2026/01/screen-7-–-Settings-–-Dataform-–-BigQuery-Google-Cloud-console-1-1.png 1724w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p></p>



<ol class="wp-block-list">
<li>Hit the rightmost Settings tab in your repo.</li>



<li>You can update the Service Account default to the new Custom one. The Workflow level settings will override this default, but removing the old Service Agent default will reduce the chance of someone using it by accident. Click the pencil to bring up a flyout, select the new account, and save.</li>



<li>Click the pencil across from “actAs permission checks”&nbsp; to change it</li>



<li>Select “Enforce actAs permission checks” and click Save.</li>
</ol>



<p>You can now run your workflows manually or wait for the scheduled ones to run and keep an eye on the results.</p>



<p>Have you run into any problems with Strict actAs permission enforcement, or found any different solutions? Let me know in the comments or use the contact page to get in touch. </p>



<p>Before I let you go, I got guidance on dealing with this Google Cloud Platform change from Ken Williams at my usual <a href="https://dive.team/">source for GCP info, DiveTeam.</a></p>



<p></p>
<p>The post <a href="https://informediteration.com/complying-with-strict-actas-permission-enforcement-in-dataform/">Complying with Strict ActAs Permission Enforcement in Dataform</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://informediteration.com/complying-with-strict-actas-permission-enforcement-in-dataform/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Slash Your GA4 BigQuery Bills By Only Overwriting Recent Data</title>
		<link>https://informediteration.com/slash-your-ga4-bigquery-bills-by-only-overwriting-recent-data/</link>
					<comments>https://informediteration.com/slash-your-ga4-bigquery-bills-by-only-overwriting-recent-data/#comments</comments>
		
		<dc:creator><![CDATA[JF Amprimoz]]></dc:creator>
		<pubDate>Fri, 17 May 2024 15:17:08 +0000</pubDate>
				<category><![CDATA[GCP - BigQuery - Dataform]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Google Analytics]]></category>
		<guid isPermaLink="false">https://informediteration.com/?p=1065</guid>

					<description><![CDATA[<p>Are your BigQuery costs starting to creep up now that you’ve had Google Analytics 4 connected for a while? Do you have scheduled queries that overwrite the destination table on every run? This post will look at a relatively easy way to run scheduled queries on GA4 data without having to query all the way [&#8230;]</p>
<p>The post <a href="https://informediteration.com/slash-your-ga4-bigquery-bills-by-only-overwriting-recent-data/">Slash Your GA4 BigQuery Bills By Only Overwriting Recent Data</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Are your BigQuery costs starting to creep up now that you’ve had Google Analytics 4 connected for a while? Do you have scheduled queries that overwrite the destination table on every run? This post will look at a relatively easy way to run scheduled queries on GA4 data without having to query all the way back to day 1.</p>
<p>Google updates GA4 data for a few days after it is initially gathered, so a lot of the methods that we might ordinarily use to add only new data to a BigQuery table won’t work. Much of the introductory material available correctly guides us to just overwrite the entire output table when scheduling queries.</p>
<p>This is the easiest way to get accurate data, and back when we’d just connected GA4 to BigQuery, the costs of doing it for even pretty busy websites were negligible. But as time goes on, and more data is gathered, the inefficiency of querying old data that hasn’t changed has increasing costs. Here’s a better way that isn’t too complicated</p>
<h2>What You’ll Need</h2>
<p>We’re assuming that you have scheduled queries you run against the whole date range of your GA4 BigQuery export, and that these queries overwrite the results table when they run.</p>
<p>If you need help with getting your reporting up and running through BigQuery, consider this <a href="https://testandlearn.community/learning-groups/prep-google-analytics-data-for-reporting-in-bigquery">course on how to Prep GA4 Data in BigQuery for Reporting</a>. Note that it doesn’t just cover the basics &#8211; later sections of the course will take you through more sophisticated stuff than we do in this post, but they build up to it.</p>
<h2>The Plan: Delete and Insert the Last 7 Days of Data, Every Day</h2>
<p>Note that there are lots of ways to do this, and some are more robust and sophisticated, but we are going to take the easy route wherever possible. Instead of trying to append new rows, update changed rows, and delete removed rows, we’re just going to delete the last seven days of data from our destination table, then query the last seven days from the export into our destination table.</p>
<p>This might seem heavy handed, but if you are coming from overwriting the entire date range, it’s a much lighter approach.</p>
<p>We’re also going to add a bit of a safety valve. Sometimes Google Analytics bugs out and data has to be backfilled, so we will still overwrite the whole table occasionally. It will also clear up some minor inconsistencies that might arise in your data. This clever idea came from June Li at ClickInsight. We’ve got things set up to do this quarterly in the example below, but you’ll see how you can modify that.</p>
<h2>Copy Your Current Tables</h2>
<p>We’re going to start by creating test versions of the tables you want to update. The easiest way I found to do this was to take the query I currently use to generate the table, wrap it in parentheses, and stick a CREATE command in front of it.</p>
<pre>CREATE TABLE
destination_dataset.destination_table_name
PARTITION BY
Date
AS (
SELECT
cast(event_date as DATE format 'YYYYMMDD') as Date,
-- ...
FROM `your_project.analytics_XXXXXXXXX.events_*`
WHERE
_table_suffix between
-- your first day of data as a YYYYMMDD string, eg 20240101
'your_first_day_of_data_as_YYYYMMDD' and
format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
)</pre>
<p>Give your tables a clear name, but note that if you keep them in the same dataset as your production tables, you can just rename them when you are done testing and ready to switch over.</p>
<p>Next, instead of <a href="/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/">configuring partitioning as part of the scheduled query UI</a> , we are going to define our partitioning in the query itself.</p>
<p>Then comes AS ( the original query ). Obviously, the query needs to include the partition column (which you can name however you please, but ours is just called Date).</p>
<p>Beyond that, you can go to town!</p>
<h2>Wait, Should I Test This First?</h2>
<p>I highly recommend testing the new versions of your queries against the old ones, because, well, you should always check data for accuracy when you change how it’s processed. In my testing I did detect some minor inconsistencies in one of my queries. In a query to generate a pageview table with session level attribution, I noticed a very small number of pageviews got attributed differently.</p>
<p>I’m guessing these were part of sessions that spanned midnight of the cutoff point seven days ago, and the original attribution data from the beginning of the session was lost. In the case of this query and the data it was running against, the difference was negligible, but that might not always be the case. And, we will run the full overwrite of the tables quarterly, which will keep these inconsistencies from accumulating.</p>
<h2>What’s the Easiest Way to Test This?</h2>
<p>Here’s the testing plan I used:</p>
<ol>
<li>Create copies of scheduled query result tables (done above)</li>
<li>Check table row counts to make sure they are identical across test copies and production originals</li>
<li>Create scheduled queries to run on the last seven days for a day or two, do a full refresh, then run on the last seven days for another day or two (we go into this in detail later).</li>
<li>You can check your results at any point in the process, but waiting through a couple normal seven day pulls, then a full overwrite, and another couple seven day pulls, reproduces both date ranges for the query, in their natural sequence.</li>
<li>Check tables for row level discrepancies using the <a href="https://medium.com/google-cloud/bigquery-table-comparison-cea802a3c64d">query shared here</a> by Mark Scannell https://medium.com/google-cloud/bigquery-table-comparison-cea802a3c64d</li>
</ol>
<p>You can also connect the tables to copies of reports you currently run from your full overwrite tables and compare the final output. This is particularly useful if you turned up row level discrepancies whose impact you need to investigate.</p>
<h2>Prepare a DELETE and INSERT Query</h2>
<p>Much like we just added some stuff to the beginning of the query we used to overwrite tables to make it create a new table with the same data, we will add commands to the beginning of our old queries in this step. Note that, unlike before, we don’t need to wrap our original query in parentheses. I start with images as they are easier to decipher, what with the colours and indents, but the code is available at the end of this section in copy and paste/screen reader friendly text.</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1066" src="https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-full.png" alt="code that deletes the last 7 days of data from a table and rewrites using fresh data from the GA4 export" width="927" height="706" srcset="https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-full.png 927w, https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-full-300x228.png 300w, https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-full-768x585.png 768w" sizes="auto, (max-width: 927px) 100vw, 927px" /></p>
<p>Let’s break this into parts:</p>
<h3>We start by declaring and setting some useful default values in variables</h3>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1067" src="https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-vars.png" alt="code to declare variables with default values in BigQuery" width="922" height="120" srcset="https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-vars.png 922w, https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-vars-300x39.png 300w, https://informediteration.com/wp-content/uploads/2024/05/bq-ga4-7day-overwrite-vars-768x100.png 768w" sizes="auto, (max-width: 922px) 100vw, 922px" /></p>
<ul>
<li>dayof: this is an integer that tells us how many days into the current year we are. The testing version gives the day of the week instead.</li>
<li>start_date: the date we want to start pulling data from, which defaults to seven days ago.</li>
<li>start_date_string: same as start_date but in a string data type so we can easily pass it as a table suffix later.</li>
</ul>
<h3>Next, we decide if we are going to do a full overwrite or stick to a seven day overwrite</h3>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1068" src="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-104708.png" alt="an If statement in BigQuery" width="769" height="143" srcset="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-104708.png 769w, https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-104708-300x56.png 300w" sizes="auto, (max-width: 769px) 100vw, 769px" /></p>
<p>We use the MOD function with the dayof variable and the number of days we want to go between full overwrites. If the day of the year divided by 91 has no remainder (so, every quarter), we do a full overwrite. You can adjust the number you divide by to set the frequency with which you want to do the full overwrite.</p>
<p>The testing version of the MOD function divides by seven, and the test version of the dayof variable set in the previous step gives the day of the week, starting from Sunday. So in test mode, you’d get your full refresh every Saturday. Perfect for setting this up one week and testing the next, but again, you can adjust the numbers to your liking.</p>
<p>If our modulus is 0, we change the values of start_date and start_date_string to represent the first day of data we gathered. Otherwise, we leave them at at 7 day lookback window we set above.</p>
<h3>DELETE the Old</h3>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1069" src="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105142.png" alt="BigQuery DML: Deleting data from a table based on date" width="796" height="57" srcset="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105142.png 796w, https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105142-300x21.png 300w, https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105142-768x55.png 768w" sizes="auto, (max-width: 796px) 100vw, 796px" /></p>
<p>Note that the WHERE clause is using the Date column we set as our partition. Again, you can call this column whatever you’d like, but deleting (and inserting) based on the partition column uses far less data than otherwise.</p>
<h3>INSERT the New</h3>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1070" src="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105417.png" alt="Inserting a data from a specific date range into a BigQuery table with DML" width="714" height="252" srcset="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105417.png 714w, https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105417-300x106.png 300w" sizes="auto, (max-width: 714px) 100vw, 714px" /></p>
<p>Finally, we insert the data from the query we’ve always used into the destination table. The big difference is, instead of always going all the way back to our first day of data, we set the beginning _table_suffix to start_date_string. And most of the time, that start_date_string will only be going back a week &#8211; think of the query data you’ll save!</p>
<h2>Schedule the DELETE and INSERT Query</h2>
<p>There are a couple important differences in how we schedule this kind of query where we specify destinations directly in the query. As you might have guessed, you DO NOT tick the <em>Set a destination table for query results</em> box, as we’ve established the destination in the query and the overwrite vs append behavior in the query itself.</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1071" src="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105815.png" alt="" width="684" height="610" srcset="https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105815.png 684w, https://informediteration.com/wp-content/uploads/2024/05/Screenshot-2024-05-17-105815-300x268.png 300w" sizes="auto, (max-width: 684px) 100vw, 684px" /></p>
<p>Otherwise this is no different from scheduling a normal query.</p>
<p>I hope you found this helpful and that you’re looking forward to saving a bunch of data. If you have any questions or other feedback, please leave a comment or contact me!</p>
<p>Here&#8217;s the plain text version:</p>
<pre>DECLARE dayof INT64 DEFAULT (EXTRACT(DAYOFYEAR FROM CURRENT_DATE()));
-- for testing use: DECLARE dayof INT64 DEFAULT (EXTRACT(DAYOFWEEK FROM CURRENT_DATE()));
DECLARE start_date DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY);
DECLARE start_date_string STRING DEFAULT FORMAT_DATE('%Y%M%D',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY));

IF MOD(dayof, 91) = 0
-- for testing use: IF MOD(dayof, 7) = 0
THEN SET (start_date,start_date_string) = 
(CAST('your_first_day_of_data_as_YYYYMMDD' AS DATE FORMAT 'YYYYMMDD'),'your_first_day_of_data_as_YYYYMMDD');
END IF;

DELETE FROM `your_project.destination_dataset.destination_table`
WHERE Date BETWEEN start_date AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);


INSERT INTO `your_project.destination_dataset.destination_table`
SELECT
CAST(event_date as DATE format 'YYYYMMDD') as Date,
-- ... rest of your query
FROM `your_project.analytics_XXXXXXXXX.events_*`
WHERE

_table_suffix between 
start_date_string and 
FORMAT_DATE('%Y%M%D',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))</pre>
<p>The post <a href="https://informediteration.com/slash-your-ga4-bigquery-bills-by-only-overwriting-recent-data/">Slash Your GA4 BigQuery Bills By Only Overwriting Recent Data</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://informediteration.com/slash-your-ga4-bigquery-bills-by-only-overwriting-recent-data/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Partition Tables for Lower GA4 BigQuery and Looker Studio Costs</title>
		<link>https://informediteration.com/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/</link>
					<comments>https://informediteration.com/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/#respond</comments>
		
		<dc:creator><![CDATA[JF Amprimoz]]></dc:creator>
		<pubDate>Wed, 08 May 2024 16:00:03 +0000</pubDate>
				<category><![CDATA[GCP - BigQuery - Dataform]]></category>
		<category><![CDATA[BigQuery]]></category>
		<category><![CDATA[Google Analytics]]></category>
		<category><![CDATA[Looker Studio]]></category>
		<guid isPermaLink="false">https://informediteration.com/?p=1048</guid>

					<description><![CDATA[<p>With many of us only getting used to BigQuery in the last year or two, getting it to work with Looker Studio and Google Analytics 4 was a learning experience in many ways. A lot of introductory resources rightly focused on getting things up and running, leaving finer details of how to have them run [&#8230;]</p>
<p>The post <a href="https://informediteration.com/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/">Partition Tables for Lower GA4 BigQuery and Looker Studio Costs</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>With many of us only getting used to BigQuery in the last year or two, getting it to work with Looker Studio and Google Analytics 4 was a learning experience in many ways. A lot of introductory resources rightly focused on getting things up and running, leaving finer details of how to have them run cheaply for later.</p>
<p>Not only would it be too much to learn at once, but given how recently many people had moved to GA4, there wouldn’t be tons of data gathered yet, so putting effort into controlling the size of queries wasn’t a priority.</p>
<p>Returning to the present, you might have noticed your bills creeping up, or maybe you are just looking to make sure that doesn’t happen. You might just be setting something up, and aren’t sure of what it will cost once all the users hit your reports, so you figure it’s better to build things to be efficient just in case.</p>
<p>Wherever you are in the process, partitioning is perhaps the easiest way to avoid unnecessary BigQuery costs, especially when connecting to Looker Studio.</p>
<h2>What You Need to Get Started</h2>
<p>We’re going to assume that you already have source tables you make by querying the Google Analytics 4 BigQuery export data, and saving the results manually or by scheduled queries. Figuring out what tables you need is a bit of an art, and figuring out how to make the queries for them is more of something you’d learn in a course than from a blog post I’m trying to keep brief.</p>
<p>For now, I’ll just say you can get a lot done with a session-scoped table, a pageview table, and a leads/sales table. I might write more about how to build those in the future, but if you want something more concrete, I’d recommend this <a href="https://testandlearn.community/learning-groups/prep-google-analytics-data-for-reporting-in-bigquery">course on how to prep GA4 data in BigQuery</a>.</p>
<p>Another note is that we’ll be setting up new scheduled queries in the examples, but you could just as easily add the partitions to existing scheduled queries if you preferred.</p>
<h2>The event_date Column Is Not a Date</h2>
<p>The good news is that the raw GA4 export is already partitioned by way of table suffixes. The bad news is that this doesn’t get inherited automatically by tables queried from that raw export. But, setting up your BQ tables with partitions, and getting Looker Studio to take advantage of that, is easy.</p>
<p>The first thing we need to do is establish what column we’ll be using to do the partitioning on, and event_date seems like a logical choice. There’s one small issue, which is that event_date is considered a string in the schema.</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1049" src="https://informediteration.com/wp-content/uploads/2024/05/Snip-BigQuery-WEB-GA4-BIG-QUERY-Google-Cloud-console-Google-Chro.png" alt="summary of fields in BigQuery GA4 export highlighting that event date is a string" width="656" height="347" srcset="https://informediteration.com/wp-content/uploads/2024/05/Snip-BigQuery-WEB-GA4-BIG-QUERY-Google-Cloud-console-Google-Chro.png 656w, https://informediteration.com/wp-content/uploads/2024/05/Snip-BigQuery-WEB-GA4-BIG-QUERY-Google-Cloud-console-Google-Chro-300x159.png 300w" sizes="auto, (max-width: 656px) 100vw, 656px" /></p>
<p>So instead of just selecting event_date, we’ll need to cast it as a date in the SELECT part of our query:</p>
<p><code>CAST(event_date AS DATE FORMAT 'YYYYMMDD') AS Event_Date</code></p>
<p>You can easily edit your queries to do this where you currently select the date. We don’t have to call the output <em>Event_Date</em>, but keep track of what you do call it as you’ll need it for the next step.</p>
<h2>Tell Big Query to Partition the Resulting Table</h2>
<p>Now that we have a suitable candidate, we need to tell BigQuery to use it as our partitioning column. There are lots of ways to do this, but I’m going to lean towards a very common and easy one: configuring it in a scheduled query.</p>
<p>When your query is ready, click <em>SCHEDULE</em> above the main editor window. In the sidebar that appears, name and schedule your query, then scroll down to the <em>Destination for Query Results</em> section. Then:</p>
<ol>
<li>Check the box to <em>Set a destination table for query results</em></li>
<li>Choose a dataset and name your new table</li>
<li>Put <em>Event_Date</em> (or whatever you called it) into the Destination table partitioning field</li>
</ol>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-1050" src="https://informediteration.com/wp-content/uploads/2024/05/partition-blog-scheduling.png" alt="steps in BigQuery interface to schedule a query that creates a partition table" width="688" height="1012" srcset="https://informediteration.com/wp-content/uploads/2024/05/partition-blog-scheduling.png 688w, https://informediteration.com/wp-content/uploads/2024/05/partition-blog-scheduling-204x300.png 204w" sizes="auto, (max-width: 688px) 100vw, 688px" /></p>
<p>Continue through the form and complete the remaining options. Because of the way the GA4 export works, unless you know what you are doing, it is usually best to select Overwrite table (4. above). The other options can all be left as is if you don’t know what they do. Click save when you are done.</p>
<p>Once your scheduled query has run, you’ll have a partitioned table you can connect to Looker Studio.</p>
<h2>One More Crucial Click</h2>
<p>The last step is to make sure Looker Studio takes advantage of your table partitioning. Whether you are setting up a new data source or reconfiguring an existing one, the process is identical once you get to the step where you edit the connection, pictured below.</p>
<p>Choose the project, dataset, and table as appropriate. If everything went well in the steps above we’ll get a checkbox (1) to <em>Use Event_Date</em> (or whatever you called the column you are partitioning on) <em>as date range dimension</em>. Make sure you remember to tick this, or the queries Looker Studio runs to feed the reports won’t take advantage of the partitioning. Hit the Connect or Reconnect button at the top right when you are done (2).</p>
<p><img loading="lazy" decoding="async" class="alignnone size-large wp-image-1051" src="https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection-1024x480.png" alt="Where to set a date partition field in the Looker Studio UI" width="1024" height="480" srcset="https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection-1024x480.png 1024w, https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection-300x141.png 300w, https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection-768x360.png 768w, https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection-1536x720.png 1536w, https://informediteration.com/wp-content/uploads/2024/05/partition-blog-looker-connection.png 1648w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></p>
<p>Thanks for reading, and let please me know if you have any thoughts or questions!</p>
<p>The post <a href="https://informediteration.com/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/">Partition Tables for Lower GA4 BigQuery and Looker Studio Costs</a> appeared first on <a href="https://informediteration.com">Informed Iteration</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://informediteration.com/partition-tables-for-lower-ga4-bigquery-and-looker-studio-costs/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
