Bruin - Unified Analytics Platform

Bruin - Unified Analytics Platform

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
Summarising User Behaviour: Users Daily Table
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from Bruin - Unified Analytics Platform
Bruin is a unified analytics platform that brings together transformation, data quality, observability and governance.
Already have an account? Sign in
Tutorials

Summarising User Behaviour: Users Daily Table

Data Modeling for Google Analytics 4 and Firebase Analytics

Sabri Karagonen's avatar
Sabri Karagonen
Jun 03, 2024
1

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
Summarising User Behaviour: Users Daily Table
Copy link
Facebook
Email
Notes
More
Share

It’s the third blog post of Modeling Google Analytics & Firebase Data. You can find the previous posts:

  1. Exporting Firebase Data to BigQuery

  2. Firebase Events Table

Thanks for reading Bruin - Unified Analytics Platform! Subscribe for free to receive new posts and support my work.

A healthy Data Warehouse (DWH) consists of a balanced combination of aggregated data models and raw event data. Aggregated models help in several ways:

  • Lower compute costs by reducing the volume of data processed.

  • Higher performance and faster queries due to the smaller dataset size.

  • Simplified querying with embedded business metrics and facts.

  • Consistent analysis by standardizing metrics and data cleaning.

  • Easy feature calculation for ML pipelines

Probably the most common data model in clickstream events is daily user aggregates regardless of business/industry since most of the dashboards are built on daily granularity. I like naming tables by the primary keys since it makes it more semantic. So I’ll call this table `users_daily`.

This table will contain 1 row for each active user on a specific day. At the end, it will include all the user metrics and attributes to help us understand the business. It’s very easy to create dashboards on top of this dashboard, so you’ll love it.

In the previous article we created an `events.events` table, so `users_daily` will be built on top of this table. Let’s start by having the most simple metrics of Google Analytics Sample Gaming Project.

The starter template is:

SELECT
    user_pseudo_id,
    dt,
    --- attributes
    --- metrics
FROM events.events
WHERE -- data cleaning
GROUP BY 1,2

In this query, you can use user_pseudo_id instead of user_id, but the main idea is we should never add any other columns to the group by. This way, we’ll ensure the primary key (PK) is consistent.

Where

This table contains only user activity data. So I recommend excluding the data not relevant to user activity. For example, Firebase triggers app_remove, os_update, app_clear_data, app_update and app_exception events even the app is not open.

Also, if a row doesn’t have a user_id, it doesn’t make sense to aggregate them either. Let’s exclude them, too.

So now, our query looks like:

SELECT
    user_id,
    dt,
    --- attributes
    --- metrics
FROM events.events
WHERE 
    user_id is not null
    and event_name not in ("app_remove", "os_update", "app_clear_data", "app_update", "app_exception")
GROUP BY 1,2

Metrics

Metrics are the aggregates that shows what users did within the day. The simplest way to obtain the main metrics is to count the main events.

When I query the table, I see the events as metric candidates:

SELECT event_name, count(*) as cnt
FROM `bruin-blog-firebase-demo.events.events_json` 
WHERE event_name not in ("app_remove", "os_update", "app_clear_data", "app_update", "app_exception")
group by 1
SELECT
    user_id,
    dt,
    --- attributes
    --- metrics
    count(*) as event_cnt,

    -- app monetization metrics
    countif(event_name="ad_reward") as ad_reward_cnt,
    countif(event_name="in_app_purchase") as iap_cnt,

    -- app engagement metrics
    countif(event_name = "level_start") as level_start_cnt,
    countif(event_name = "level_end") as level_end_cnt,
    countif(event_name = "level_complete") as level_complete_cnt,
    countif(event_name = "level_fail") as level_fail_cnt,
    countif(event_name = "level_retry") as level_retry_cnt,
    countif(event_name = "level_start_quickplay") as level_start_quickplay_cnt,
    countif(event_name = "level_end_quickplay") as level_end_quickplay_cnt,
    countif(event_name = "level_complete_quickplay") as level_complete_quickplay_cnt,
    countif(event_name = "level_fail_quickplay") as level_fail_quickplay_cnt,
    countif(event_name = "level_retry_quickplay") as level_retry_quickplay_cnt,
    countif(event_name = "level_up") as level_up_cnt,
    countif(event_name = "challenge_a_friend") as challenge_a_friend_cnt,
    countif(event_name = "challenge_accepted") as challenge_accepted_cnt,
    countif(event_name = "no_more_extra_steps") as no_more_extra_steps_cnt,
    countif(event_name = "use_extra_steps") as use_extra_steps_cnt,
    countif(event_name = "select_content") as select_content_cnt,
    
FROM events.events
WHERE 
    user_id is not null
    and event_name not in ("app_remove", "os_update", "app_clear_data", "app_update", "app_exception")
GROUP BY 1,2

As you can see, I tried to add all the relevant events to the summary table. This will help us to monitor the user behavior in the dashboards very quickly.

In the second iteration of the metrics, we will use parameters. This process requires a good understanding the events and their parameters. For example:

level events have these parameters:

SELECT event_name, ep as event_params
FROM `bruin-blog-firebase-demo.events.events_json` 
WHERE event_name like "level%"

By using these events, we can obtain the min&max level played on that day. To do it, we can just add these to the query:

SELECT
    user_id,
    dt,

    min(level) as min_level,
    max(level) as max_level,

FROM events.events
WHERE ...
GROUP BY 1,2

Here, you should be very careful that the level parameter is not used in other levels in another context. If it’s used, you can filter your query for specific metrics:

    min(if(event_name like 'level_%', level, null)) as min_level,
    max(if(event_name like 'level_%', level, null)) as max_level,
    min(if(event_name = 'level_complete', level, null)) as min_level_completed,
    max(if(event_name = 'level_complete', level, null)) as max_level_completed,

Let’s inspect the in_app_purchase event to add the revenue to our table:

SELECT event_name, event_params, event_value_in_usd
FROM `bruin-blog-firebase-demo.events.events_json` 
WHERE event_name = "in_app_purchase"

As you can see, there are currency and price parameters. However, since they have different currencies, using it is impossible. To solve this problem, Google Analytics provides the `event_value_in_usd` field. Let’s add it to our metrics as well:

    sum(if(event_name="in_app_purchase", event_value_in_usd, 0)) as iap_revenue

After that point, I recommend learning the product very well, deciding on the metrics with product managers, and adding all of them to the users_daily table. The only exception is, for calculated metrics, I recommend you to do it on the BI layer.

For example, if you want to calculate win_rate, it’s better to add level_start_cnt and level_complete_cnt to this table. By using level_complete_cnt / level_start_cnt, you can easily calculate the win rate.

Attributes

Attributes are non-aggregatable information that provides additional details about the user, such as the user’s device brand, country, OS version, and platform. To summarize these attributes, I recommend using two different methods based on their types:

  1. Dynamic Values

  2. Static Values

Let’s start with the country attribute. A user can use your app in more than one country during a single day. For such dynamic values, it makes sense to capture the first and last values of the day. Country, os_version, app_version are examples of this group.

To get them, you can use min_by and max_by functions.

SELECT
    user_id,
    dt,

    MIN_BY(country, ts) as first_country,
    MAX_BY(country, ts) as last_country,

FROM events.events
WHERE ...
GROUP BY 1,2

Some of the values don’t change for a specific user id. For example, user_pseudo_id is assigned right after installing the app. If you are using user_pseudo_id as the primary key, we expect device platform, brand, device model values to be stationary. For that kind of attributes, we can use any_value since it doesn’t change for a user.

SELECT
    user_pseudo_id,
    dt,

    ANY_VALUE(platform) as platform,
    ANY_VALUE(device_brand) as device_brand,
    ANY_VALUE(device_model) as device_model,

FROM events.events
WHERE ...
GROUP BY 1,2

If you are controlling your user ids from a backend system, I don’t recommend using ANY_VALUE, but FIRST/LAST since all the attributes become dynamic.

Lastly, I want to mention user behaviour attributes. They show user behaviour like metrics, but still they’re mostly strings, so summing, counting doesn’t make sense on them.

One example is screen or event_name. If you want to measure where the users leave your app, you might want to calculate the first/last screen and events.

SELECT
    user_pseudo_id,
    dt,

    MIN_BY(event_name, ts) as first_event,
    MAX_BY(event_name, ts) as last_event,
    MIN_BY(screen, ts) as first_screen,
    MAX_BY(screen, ts) as last_screen,

FROM events.events
WHERE ...
GROUP BY 1,2

When we bring everything together, you’ll have a query like:

SELECT
    user_id,
    dt,

    MIN_BY(country, ts) as first_country,
    MAX_BY(country, ts) as last_country,
    MIN_BY(device_brand, ts) as first_device_brand,
    MAX_BY(device_brand, ts) as last_device_brand,
    MIN_BY(os_version, ts) as first_os_version,
    MAX_BY(os_version, ts) as last_os_version,
    MIN_BY(device_language, ts) as first_device_language,
    MAX_BY(device_language, ts) as last_device_language,

    count(*) as event_cnt,
    min(ts) as min_ts,
    max(ts) as max_ts,

    MIN_BY(event_name, ts) as first_event,
    MAX_BY(event_name, ts) as last_event,
    MIN_BY(screen, ts) as first_screen,
    MAX_BY(screen, ts) as last_screen,

    min(level) as min_level,
    max(level) as max_level,
    count(distinct timestamp_trunc(ts, minute)) as session_duration, 

    -- app monetization metrics
    countif(event_name="ad_reward") as ad_reward_cnt,
    countif(event_name="in_app_purchase") as iap_cnt,
    sum(if(event_name="in_app_purchase", event_value_in_usd, 0)) as iap_revenue,

    -- app engagement metrics
    countif(event_name = "level_start") as level_start_cnt,
    countif(event_name = "level_end") as level_end_cnt,
    countif(event_name = "level_complete") as level_complete_cnt,
    countif(event_name = "level_fail") as level_fail_cnt,
    countif(event_name = "level_retry") as level_retry_cnt,
    countif(event_name = "level_start_quickplay") as level_start_quickplay_cnt,
    countif(event_name = "level_end_quickplay") as level_end_quickplay_cnt,
    countif(event_name = "level_complete_quickplay") as level_complete_quickplay_cnt,
    countif(event_name = "level_fail_quickplay") as level_fail_quickplay_cnt,
    countif(event_name = "level_retry_quickplay") as level_retry_quickplay_cnt,
    countif(event_name = "level_up") as level_up_cnt,
    countif(event_name = "challenge_a_friend") as challenge_a_friend_cnt,
    countif(event_name = "challenge_accepted") as challenge_accepted_cnt,
    countif(event_name = "no_more_extra_steps") as no_more_extra_steps_cnt,
    countif(event_name = "use_extra_steps") as use_extra_steps_cnt,
    countif(event_name = "select_content") as select_content_cnt,
    
FROM events.events
WHERE 
    user_id is not null
    and event_name not in ("app_remove", "os_update", "app_clear_data", "app_update", "app_exception")
GROUP BY 1,2

Running on Production

Since the previous days’ data never changes, I recommend using an incremental flow for this table. You can find the full version of this pipeline here.

Also, you can use `bruin init firebase` in your terminal to bring all Firebase Analytics pipeline to your computer.

If you have questions about this article or running Bruin pipelines, don’t forget to join our Slack community.

If you need help to run your data pipelines, visit Bruin website. Bruin is an end-to-end data product platform that empowers your data analysts/scientists by 10x!

Thanks for reading Bruin - Unified Analytics Platform! Subscribe for free to receive new posts and support my work.

Taha Donmus's avatar
1 Like
1

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
Summarising User Behaviour: Users Daily Table
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
🚀 Launch: Bruin CLI
Bruin CLI is an open-source data pipeline tool built with Go, with built-in data ingestion, transformation, and data quality checks.
Dec 20, 2024 â€¢ 
Burak Karakan
3

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
🚀 Launch: Bruin CLI
Copy link
Facebook
Email
Notes
More
Firebase Events Table
How to query Firebase events table and some tricks to make it easier
Feb 23, 2024 â€¢ 
Sabri Karagonen
2

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
Firebase Events Table
Copy link
Facebook
Email
Notes
More
No-code data platform is a lie
The trade-offs between the different solutions, and graduating to a mature data organization.
Oct 1, 2024 â€¢ 
Burak Karakan

Share this post

Bruin - Unified Analytics Platform
Bruin - Unified Analytics Platform
No-code data platform is a lie
Copy link
Facebook
Email
Notes
More

Ready for more?

© 2025 Bruin Data Limited
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.