How QueryFuser Reduces Your BigQuery Bill
A purpose-built proxy that sits between your BI tools and BigQuery, designed from the ground up to eliminate redundant table scans.
BigQuery Charges Per Scan, Not Per Question
BigQuery’s on-demand pricing bills $5 for every terabyte of data scanned. When a dashboard loads, your BI tool fires a separate query for every tile. An 8-tile dashboard means 8 independent queries, each scanning the same fact table. BigQuery treats each one as a separate full scan.
The tiles show different things — revenue by region, orders by product,
trend by month — but they all hit the same table, scanning the same
amount column over and over. The data doesn’t change between
those queries. The only difference is the GROUP BY and the join key.
QueryFuser was built to solve exactly this: read the data once, answer many questions. When multiple users open the same dashboard, the savings multiply further.
The Cost of Concurrency
amount + 1 join key
2 units
amount + 3 keys
4 units
1 unit = one INT64/FLOAT64 column (100 M × 8 B = 800 MB). Dashboards with more tiles or fewer dims save even more. When multiple users load the same dashboard, savings multiply further.
How Query Fusion Works
A detailed look at each stage of the proxy pipeline, from receiving a PostgreSQL query to delivering BigQuery results.
Connection and Authentication
Your BI tool connects to QueryFuser using standard PostgreSQL wire protocol. QueryFuser authenticates the user against its own user database (MD5 password authentication) and resolves which BigQuery project and service account credentials to use based on the connection's database name.
SQL Translation
Incoming PostgreSQL queries are parsed and translated to BigQuery dialect. This includes identifier quoting (backticks), function mappings (e.g., NOW() to CURRENT_TIMESTAMP()), type casts, and dataset qualification. Catalog queries (pg_catalog tables) are intercepted and answered from a local PostgreSQL metadata cache so BI tools see proper table and column metadata.
Merge Window Grouping
The translated BigQuery SQL is submitted to the merger. The merger maintains a map of active groups keyed by (user_id, project_id). Each incoming query is added to its group. The group is held open for a configurable merge window (default: 150 ms). If the group reaches its maximum size before the window expires, it flushes immediately.
Table Overlap Partitioning
When a group flushes, QueryFuser parses each query's FROM clause to extract referenced tables. Queries are then partitioned into sub-groups based on table overlap. Only queries sharing a configurable minimum number of tables are merged together. This prevents unrelated queries from being forced into the same merged query, which could hurt performance or cause unnecessary failures.
Query Fusion and Column Normalization
For each sub-group of overlapping queries, QueryFuser builds a single merged SQL statement that reads the shared tables once. If queries have different column counts or types, the results are automatically normalized so the merged query returns a valid, unified result set.
Execution and Result Splitting
The merged query is sent to BigQuery via the REST API as a single job. BigQuery scans the shared tables once. When results return, QueryFuser splits the unified result set apart and delivers each caller’s rows through their PostgreSQL connection. Each caller sees exactly the result set they would have received from an individual query.
Fallback on Failure
If the merged query fails for any reason (permission error, incompatible schemas, BigQuery limits), QueryFuser re-executes each constituent query individually in parallel. Queries that would succeed on their own still succeed. Queries that would fail still return their real error message. The merge attempt is logged with the error for diagnostics.
Built for Performance and Reliability
QueryFuser is written in Rust using the Tokio async runtime. The entire proxy runs as a single process with no external dependencies beyond a PostgreSQL metadata database and BigQuery API access.
The merge loop is a single async task that handles grouping with sub-microsecond overhead per query. Actual BigQuery execution happens in independent async tasks, so one customer's slow query never blocks another customer's group from being assembled or flushed.
The proxy serves the PostgreSQL wire protocol using pgwire, supporting both simple and extended query protocols. This means compatibility with psql, JDBC drivers, ODBC drivers, and every major BI tool.
Async from Top to Bottom
Every I/O operation is non-blocking. A single QueryFuser instance can handle thousands of concurrent connections on a 2-core machine because 99% of the time is spent waiting on BigQuery network responses.
Customer Isolation
Merge groups are keyed by user ID and project ID. Two customers' queries never merge together, even if they run the exact same SQL at the same time. Credentials and billing are always isolated.
Zero Code Changes
No SDK, no library, no query annotations. Your BI tool thinks it is talking to PostgreSQL. QueryFuser handles everything transparently.