For MySQL 8+: use the recursive with
syntax.
For MySQL 5.x: use inline variables, path IDs, or self-joins.
MySQL 8+
with recursive cte (id, name, parent_id) as ( select id, name, parent_id from products where parent_id = 19 union all select p.id, p.name, p.parent_id from products p inner join cte on p.parent_id = cte.id ) select * from cte;
The value specified in parent_id = 19
should be set to the id
of the parent you want to select all the descendants of.
MySQL 5.x
For MySQL versions that do not support Common Table Expressions (up to version 5.7), you would achieve this with the following query:
select id, name, parent_id from (select * from products order by parent_id, id) products_sorted, (select @pv := '19') initialisation where find_in_set(parent_id, @pv) and length(@pv := concat(@pv, ',', id))
Here is a fiddle.
Here, the value specified in @pv := '19'
should be set to the id
of the parent you want to select all the descendants of.
This will work also if a parent has multiple children. However, it is required that each record fulfills the condition parent_id < id
, otherwise the results will not be complete.
Variable assignments inside a query
This query uses specific MySQL syntax: variables are assigned and modified during its execution. Some assumptions are made about the order of execution:
- The
from
clause is evaluated first. So that is where@pv
gets initialised. - The
where
clause is evaluated for each record in the order of retrieval from thefrom
aliases. So this is where a condition is put to only include records for which the parent was already identified as being in the descendant tree (all descendants of the primary parent are progressively added to@pv
). - The conditions in this
where
clause are evaluated in order, and the evaluation is interrupted once the total outcome is certain. Therefore the second condition must be in second place, as it adds theid
to the parent list, and this should only happen if theid
passes the first condition. Thelength
function is only called to make sure this condition is always true, even if thepv
string would for some reason yield a falsy value.
All in all, one may find these assumptions too risky to rely on. The documentation warns:
you might get the results you expect, but this is not guaranteed […] the order of evaluation for expressions involving user variables is undefined.
So even though it works consistently with the above query, the evaluation order may still change, for instance when you add conditions or use this query as a view or sub-query in a larger query. It is a “feature” that will be removed in a future MySQL release:
Previous releases of MySQL made it possible to assign a value to a user variable in statements other than
SET
. This functionality is supported in MySQL 8.0 for backward compatibility but is subject to removal in a future release of MySQL.
As stated above, from MySQL 8.0 onward you should use the recursive with
syntax.
Efficiency
For very large data sets this solution might get slow, as the find_in_set
operation is not the most ideal way to find a number in a list, certainly not in a list that reaches a size in the same order of magnitude as the number of records returned.
Alternative 1: with recursive
, connect by
More and more databases implement the SQL:1999 ISO standard WITH [RECURSIVE]
syntax for recursive queries (e.g. Postgres 8.4+, SQL Server 2005+, DB2, Oracle 11gR2+, SQLite 3.8.4+, Firebird 2.1+, H2, HyperSQL 2.1.0+, Teradata, MariaDB 10.2.2+). And as of version 8.0, also MySQL supports it. See the top of this answer for the syntax to use.
Some databases have an alternative, non-standard syntax for hierarchical look-ups, such as the CONNECT BY
clause available on Oracle, DB2, Informix, CUBRID and other databases.
MySQL version 5.7 does not offer such a feature. When your database engine provides this syntax or you can migrate to one that does, then that is certainly the best option to go for. If not, then also consider the following alternatives.
Alternative 2: Path-style Identifiers
Things become a lot easier if you would assign id
values that contain the hierarchical information: a path. For example, in your case this could look like this:
ID | NAME |
---|---|
19 | category1 |
19/1 | category2 |
19/1/1 | category3 |
19/1/1/1 | category4 |
Then your select
would look like this:
select id, name from products where id like '19/%'
Alternative 3: Repeated Self-joins
If you know an upper limit for how deep your hierarchy tree can become, you can use a standard sql
query like this:
select p6.parent_id as parent6_id, p5.parent_id as parent5_id, p4.parent_id as parent4_id, p3.parent_id as parent3_id, p2.parent_id as parent2_id, p1.parent_id as parent_id, p1.id as product_id, p1.name from products p1 left join products p2 on p2.id = p1.parent_id left join products p3 on p3.id = p2.parent_id left join products p4 on p4.id = p3.parent_id left join products p5 on p5.id = p4.parent_id left join products p6 on p6.id = p5.parent_id where 19 in (p1.parent_id, p2.parent_id, p3.parent_id, p4.parent_id, p5.parent_id, p6.parent_id) order by 1, 2, 3, 4, 5, 6, 7;
See this fiddle
The where
condition specifies which parent you want to retrieve the descendants of. You can extend this query with more levels as needed.