The best way to think about this is by looking at what happens on a row-by-row basis. Setting up some input data, we have:
WITH t1 AS ( SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL SELECT 2, [2, 4, 5] ) ...
(I’m using a third element for the second row to make things more interesting). If we just select from it, we get output that looks like this:
WITH t1 AS ( SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL SELECT 2, [2, 4, 5] ) SELECT * FROM t1; +----+---------------+ | id | numbers_array | +----+---------------+ | 1 | [0, 1] | | 2 | [2, 4, 5] | +----+---------------+
Now let’s talk about unnesting. The UNNEST
function takes an array and returns a value table of the array’s element type. Whereas most BigQuery tables are SQL tables defined as a collection of columns, a value table has rows of some value type. For numbers_array
, UNNEST(numbers_array)
returns a value table whose value type is INT64
, since numbers_array
is an array with an element type of INT64
. This value table contains all of the elements in numbers_array
for the current row from t1
.
For the row with an id
of 1, the contents of the value table returned by UNNEST(numbers_array)
are:
+-----+ | f0_ | +-----+ | 0 | | 1 | +-----+
This is the same as what we get with the following query:
SELECT * FROM UNNEST([0, 1]);
UNNEST([0, 1])
in this case means “create a value table from the INT64
values 0
and 1
“.
Similarly, for the row with an id
of 2, the contents of the value table returned by UNNEST(numbers_array)
are:
+-----+ | f0_ | +-----+ | 2 | | 4 | | 5 | +-----+
Now let’s talk about how CROSS JOIN
fits into the picture. In most cases, you use CROSS JOIN
between two uncorrelated tables. In other words, the contents of the table on the right of the CROSS JOIN
are not defined by the current contents of the table on the left.
In the case of arrays and UNNEST
, however, the contents of the value table produced by UNNEST(numbers_array)
change depending on the current row of t1
. When we join the two tables, we get the cross product of the current row from t1
with all of the rows from UNNEST(numbers_array)
. For example:
WITH t1 AS ( SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL SELECT 2, [2, 4, 5] ) SELECT id, number FROM t1 CROSS JOIN UNNEST(numbers_array) AS number; +----+--------+ | id | number | +----+--------+ | 1 | 0 | | 1 | 1 | | 2 | 2 | | 2 | 4 | | 2 | 5 | +----+--------+
numbers_array
has two elements in the first row and three elements in the second, so we get 2 + 3 = 5
rows in the result of the query.
To answer the question about how this differs from flattening the numbers_array
and then performing a CROSS JOIN
, let’s look at the results of this query:
WITH t1 AS ( SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL SELECT 2, [2, 4, 5] ), t2 AS ( SELECT number FROM t1 CROSS JOIN UNNEST(numbers_array) AS number ) SELECT number FROM t2; +--------+ | number | +--------+ | 0 | | 1 | | 2 | | 4 | | 5 | +--------+
In this case, t2
is is a SQL table with a column named number
with those values. If we perform a CROSS JOIN
between t1
and t2
, we get a true cross product of all rows:
WITH t1 AS ( SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL SELECT 2, [2, 4, 5] ), t2 AS ( SELECT number FROM t1 CROSS JOIN UNNEST(numbers_array) AS number ) SELECT id, numbers_array, number FROM t1 CROSS JOIN t2; +----+---------------+--------+ | id | numbers_array | number | +----+---------------+--------+ | 1 | [0, 1] | 0 | | 1 | [0, 1] | 1 | | 1 | [0, 1] | 2 | | 1 | [0, 1] | 4 | | 1 | [0, 1] | 5 | | 2 | [2, 4, 5] | 0 | | 2 | [2, 4, 5] | 1 | | 2 | [2, 4, 5] | 2 | | 2 | [2, 4, 5] | 4 | | 2 | [2, 4, 5] | 5 | +----+---------------+--------+
So what’s the difference between this and the previous query with CROSS JOIN UNNEST(numbers_array)
? In this case, the contents of t2
don’t change for each row from t1
. For the first row in t1
, there are five rows in t2
. For the second row in t1
, there are five rows in t2
. As a result, the CROSS JOIN
between the two of them returns 5 + 5 = 10
rows in total.