How Roblox Used Theta Sketches to Scale Creator Analytics

Analytics are essential to today’s real-time multiplayer games. At Roblox, we’re focused on developing measurement tools to help our creators thrive. Our free, out-of-the-box analytics give creators instant insight into their experiences' growth, user acquisition, and retention, helping them maximize their success.

Building the up-to-date analytics systems that millions of Roblox creators depend on is an important challenge. To tackle it, we’ve optimized our analytics query engine so that a 120 core processing cluster can serve more than 6 million queries per day from approximately 300,000 daily visitors accessing 86 TB of data. At the heart of our solution is an online analytical processing (OLAP) database that we chose for its scalability and integration with approximation algorithms. Using a mix of data rollup techniques plus HyperLogLog and Theta Sketch algorithms, we provide analytics for millions of Roblox experiences1.

A Primer on OLAP Analytics

The more data that’s being queried, the longer it takes to produce results. When we’re able to decrease the data needed and speed up the analysis process, creators can get near real-time insights from their actions. Some of the techniques we utilize include:

  1. Columnar storage: The OLAP, Druid, reads only the necessary columns.

  2. Partition and sort filters: The OLAP only reads relevant files that index directly to needed data blocks.

  3. Rollup: The OLAP partially aggregates events using common groupings.

Rollups, in particular, enable OLAPs to operate between the largest SQL query engines, like Spark or Presto (with latencies of tens of seconds), and point query or limited SQL, which usually serve fully aggregated data. With rollups, queries become keyed by groupings of dimensions, resulting in large total row cardinality reductions. When looking at billions or even trillions of raw events, it can be much more efficient to roll them up into millions of groupings that can be aggregated with subsecond latency. For example:

While rollups offer the reductions advantages mentioned above, certain metrics are resistant to them, including queries that require a full table sort of raw data, like distinct counts, percentiles, and frequency queries.

Fortunately, we can get around these limitations with techniques that return a statistically bounded approximate result based on complex data structures that hold a sample of the full dataset. These data structures are designed to be used in rollup techniques and combine two distinct counts via a union operation, similar to adding two numbers together.

Breaking Down Roblox Analytics Workloads

At Roblox, we provide creators with a centralized dashboard where they can find their most important insights. These include:

  • Engagement: daily active users (DAU), monthly active users (MAU), retention, and funnels

  • Monetization: revenue, average revenue per user, sales, and economy

  • Acquisition data

  • Thumbnail personalization and experiment results

  • Home Recommendations analytics

  • And more to come.

When building our system, we focused on optimizing worst-case queries. These are typically large distinct counts (>100M UUIDs) like MAU for popular experiences, which can throttle load times from seconds to minutes. We built a statistical approximation framework to keep queries within a two-second latency. We adapted both HyperLogLog and Theta Sketch techniques from industry standard libraries, which reduced the worst-case queries from reading more than 100 million rows to around 5 million.

Choosing and Optimizing the Query Engine

After selecting our OLAP solution we loaded six months of engagement data and pressure-tested our system’s performance limits. With about 100 cores and 500 GB of memory, we found that we could randomly merge 5 million binary Theta Sketch objects (totaling roughly 100 MB) within two seconds. This was done on cold-start queries reading from disk without accessing any in-memory cache. Networked storage options such as S3 reads, which Clickhouse and Duckdb offer by default, show significantly worse performance.

Overcoming Performance Challenges

In a final round of production shadow testing, we discovered an important challenge: Our MAU query performance needed to be bolstered after swapping from single large queries into per-day aggregation patterns. These are crucial for our creator analytics visualizations.

We found that the structure of the query greatly affected our OLAP solution’s underlying performance. Standard queries with multiple execution stages (like nested “GROUP BY” statements2) often push large portions of the work onto the lightweight broker nodes.

This is a classic big data problem where some portion of a query ends up executing on important small serving nodes. We expected our approximate data structures to function like simple counts or sums, but we discovered they were actually behaving much differently.

The figure below illustrates the problem. It shows how our historical nodes would do partial aggregation, rolling up a Theta Sketch for each day and then pushing their data back to the broker. The broker then attempted to merge each large daily sketch into a single monthly value per day. For 30 days of MAU, this meant merging 1,800 max-size Theta Sketches on a broker, which resulted in a slower, failure-prone query that monopolized the broker CPU.

Our solution was to run the OLAP with fewer large historical workers to maximize data locality for data sources that relied on approximation queries. In practice, this pushed a merge operation that could have required more than 100 MB of data processing back onto our historical nodes.

To achieve this in SQL, we used an inline join to make queries propagate the necessary information to historical nodes and prepared a query with a list of inline result dates. Each result date can then gather the relevant data from the historical node segments. The data is then passed back to the broker, where results are quickly merged into a single map of result date to metric data, as seen below.

This optimization had a dramatic impact on performance for large-scale queries. For a major experience’s MAU breakdown by country, average query performance improved by 5x (from 17.53 seconds to 3.23), as shown in the chart below. We also saw a 50x reduction in CPU time on the broker (from 16.83 seconds down to 0.34).

While results vary, this highlights the importance of treating complex operations (such as merging millions of sketches) with care. Assuming these operations are equivalent to simple aggregations can lead to significant performance issues, especially on systems where last-mile client aggregations are common.

The average analytics query on our platform has minimal breakdowns and rarely touches high-cardinality dimensions (such as country). Knowing this, we decided to duplicate our data, creating a rollup table with fewer dimensions, which would be sufficient for more than 98%3 of our queries. It performed four times better on the average query.

We also explored a theta cube, or a generalized approach to bridge the gap between basic rollup tables and fully raw tables via approximate set intersections. This approach addresses a fundamental limitation: Rollup tables lose their advantage when queries need to touch many layers of high-cardinality dimensions. That’s because each dimension causes the rollup cardinality to scale as ∏dim (product of dimensions).

We designed a system that would aggregate by common dimension groups with a cardinality cap allowing for rollup performance queries on anything in the group. Then, when looking for combinations of dimensions across groups, we would attempt an approximate join4 across the sets and return the metric results along with error estimates. A query with high estimated error would be forwarded to a raw table, where the many filters should allow for large pushdown optimizations.

This theta cube approach switches dimensionality, resulting in a ∑dim (sum of dimensions) expansion for the number of rows instead of the ∏dim expansion. Of course, this can sacrifice accuracy, a dynamic that’s directly proportional to the overlap size5 between the two dimension groups. The underlying reason for this relates directly to how Theta Sketches store a bottom K-style sorted list, which will maximize collisions between two sets with high inherent overlap.

Because we can calculate this error rate quickly, it also serves as a strong signal that reading the raw table will likely be performant. In cases where overlapping data is small relative to the union (Japanese speakers in Germany, for example), a large number of the raw table rows will be filtered out. That results in efficient pushdown optimizations. A system that uses dimension groups, approximate joins, and error-based raw table reads would truly maximize rollup performance on approximation-friendly queries.

For Roblox, this solution will be more applicable at our next level of scale—potentially for dynamic funnel or custom event analysis—while our current simple rollup replica satisfies today’s needs.

Building a Self-Serve Platform

With our broker optimized, we turned toward building tools for onboarding and querying datasets added to our OLAP solution. We built an open-source Spark and Trino UDAF library for our datasketch functions, allowing Spark to use the same binary datasketch format as our OLAP6. This kept most of our compute workload in Spark and helped standardize approximation across Roblox, potentially reducing compute costs by up to 80% for certain datasets.

We simplified onboarding with an internal extension to our batch job scheduler and defined a dataframe-style API that guides developers to decide on definitive measures and dimensions, reducing the impact of open queries. We’ve also open-sourced some sample workflows for how we load and query this data in our OLAP.

Our optimized analytics datasets are now providing deep insights to our creators. Our optimizations improved average performance by 4x and worst-case performance by 50x. The self-serve platform enables our Creator Analytics team to continue iterating on new datasets for developers. We’re excited to see developers of all sizes use these tools to create incredible experiences on Roblox.

1 Calculated by last 60 days of unique universes with any access
2 Like this naive MAU query
3 Results are from March 21-28, 2025
4 Performed like this: SELECT c.experience_id, c.country, p.platform, THETA_INTERSECT(c.user_theta, p.user_theta) from (select experience_id, country, user_theta from theta_cube where agg_level = country) c union (select experience_id, platform, user_theta from theta_cube where agg_level = platform) p
5 https://datasketches.apache.org/docs/Theta/ThetaSketchSetOpsAccuracy.html
6 Through a druid sql function COMPLEX_DECODE_BASE64('HLLSketch', sketch_col_name).