Introduction to MySQL 8.0 Common Table Expressions (Part 1)

栏目: IT技术 · 发布时间: 4年前

内容简介:We can define a CTE as an alternative to a derived table. In a small way, CTE simplifies complex joins and subqueries, improving the readability of the queries. CTE is part of ANSI SQL 99 and was introduced in MySQL 8.0.1. The same feature is available eve

Introduction to MySQL 8.0 Common Table Expressions (Part 1) This blog is the first part of a two-articles series. In this article, I’m going to introduce the Common Table Expression (CTE), a new feature available on MySQL 8.0, as well as Percona Server for MySQL 8 .

What is a Common Table Expression?

We can define a CTE as an alternative to a derived table. In a small way, CTE simplifies complex joins and subqueries, improving the readability of the queries. CTE is part of ANSI SQL 99 and was introduced in MySQL 8.0.1. The same feature is available even on Percona Server for MySQL 8.0.

The main reasons for using CTE are:

  • Better readability of the queries
  • Can be referenced multiple times in the same query
  • Improved performance
  • A valid alternative to a VIEW, if your user cannot create VIEWs
  • Easier chaining of multiple CTE
  • Possibility to create recursive queries: this can be really useful when dealing with hierarchical data

SELECT , UPDATE, and DELETE statements can reference the CTE.

Note: for the examples in this article, I’ll use the world database you can download from the MySQL site .

How to Create and Use a CTE

We have said that a CTE is like a derived table when using a subquery, but the declaration is moved before the main query. A new dedicated clause is needed: WITH .

Let’s start with a subquery with a derived table to find the most populated countries in Europe:

mysql> SELECT Name, Population 
    -> FROM (SELECT * FROM country WHERE continent='Europe') AS derived_t 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

Let’s rewrite it using the CTE instead:

mysql> WITH cte AS (SELECT * FROM country WHERE continent='Europe')
    -> SELECT Name, Population 
    -> FROM cte 
    -> ORDER BY Population DESC LIMIT 5;
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Russian Federation |  146934000 |
| Germany            |   82164700 |
| United Kingdom     |   59623400 |
| France             |   59225700 |
| Italy              |   57680000 |
+--------------------+------------+

The syntax is quite simple. Before your query, using WITH, you can define the CTE or even multiple CTEs. After that, you can reference in the query all the CTEs as many times as you need. You can think about a CTE as a sort pre-materialized query available as a temporary table for the scope of your main query.

We can also specify the column names if a parenthesized list of names follows the CTE name:

mysql> WITH cte(eur_name, eur_population) AS (SELECT Name, Population FROM country WHERE continent='Europe')
    -> SELECT eur_name, eur_population
    -> FROM cte
    -> ORDER BY eur_opulation DESC LIMIT 5;
+--------------------+----------------+
| eur_name           | eur_population |
+--------------------+----------------+
| Russian Federation |      146934000 |
| Germany            |       82164700 |
| United Kingdom     |       59623400 |
| France             |       59225700 |
| Italy              |       57680000 |
+--------------------+----------------+

CTE can also be used as the data source for updating other tables as in the following examples:

# create a new table 
mysql> CREATE TABLE country_2020 ( Code char(3), Name char(52), Population_2020 int, PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.10 sec)
 
# copy original data
mysql> INSERT INTO country_2020 SELECT Code, Name, Population FROM country;
Query OK, 239 rows affected (0.01 sec)
Records: 239  Duplicates: 0  Warnings: 0
 
# increase population of european countries by 10%
mysql> WITH cte(eur_code, eur_population) AS (SELECT Code, Population FROM country WHERE continent='Europe')  
    -> UPDATE country_2020, cte 
    -> SET Population_2020 = ROUND(eur_population*1.1) 
    -> WHERE Code=cte.eur_code;
Query OK, 46 rows affected (0.01 sec)
Rows matched: 46  Changed: 46  Warnings: 0
 
# delete from the new table all non-europian countries
mysql> WITH cte AS (SELECT Code FROM country WHERE continent <> 'Europe') 
    -> DELETE country_2020 
    -> FROM country_2020, cte 
    -> WHERE country_2020.Code=cte.Code;
Query OK, 193 rows affected (0.02 sec)
 
mysql> SELECT * FROM country_2020 ORDER BY Population_2020 DESC LIMIT 5;
+------+--------------------+-----------------+
| Code | Name               | Population_2020 |
+------+--------------------+-----------------+
| RUS  | Russian Federation |       161627400 |
| DEU  | Germany            |        90381170 |
| GBR  | United Kingdom     |        65585740 |
| FRA  | France             |        65148270 |
| ITA  | Italy              |        63448000 |
+------+--------------------+-----------------+

CTE can also be used for INSERT … SELECT queries like the following:

mysql> CREATE TABLE largest_countries (Code char(3), Name char(52), SurfaceArea decimal(10,2), PRIMARY KEY(Code) );
Query OK, 0 rows affected (0.08 sec)
 
mysql> INSERT INTO largest_countries
    -> WITH cte AS (SELECT Code, Name, SurfaceArea FROM country ORDER BY SurfaceArea DESC LIMIT 10)
    -> SELECT * FROM cte;
Query OK, 10 rows affected (0.02 sec)
Records: 10  Duplicates: 0  Warnings: 0
 
mysql> SELECT * FROM largest_countries;
+------+--------------------+-------------+
| Code | Name               | SurfaceArea |
+------+--------------------+-------------+
| ARG  | Argentina          |  2780400.00 |
| ATA  | Antarctica         | 13120000.00 |
| AUS  | Australia          |  7741220.00 |
| BRA  | Brazil             |  8547403.00 |
| CAN  | Canada             |  9970610.00 |
| CHN  | China              |  9572900.00 |
| IND  | India              |  3287263.00 |
| KAZ  | Kazakstan          |  2724900.00 |
| RUS  | Russian Federation | 17075400.00 |
| USA  | United States      |  9363520.00 |
+------+--------------------+-------------+
10 rows in set (0.00 sec)

Let’s think about a CTE like a temporary table pre-calculated or materialized before the main query. Then you can reference that temporary table as many times you need in your query.

Also, you can create multiple CTEs, and all of them can be used in the main query. The syntax is like the following:

WITH cte1 AS (SELECT ... FROM ... WHERE ...),
     cte2 AS (SELECT ... FROM ... WHERE ...)
SELECT ...
FROM table1, table1, cte1, cte2 ....
WHERE .....

Scope of CTE

A CTE can be used even in subqueries, but in such a case, be aware of the scope. The CTE exists for the scope of a single statement.

Consider the following valid queries:

WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT * FROM city WHERE city.CountryCode IN 
(SELECT Code FROM cte);   # Scope: "cte" is visible to top SELECT
 
SELECT * FROM city WHERE city.CountryCode IN 
(WITH cte AS (SELECT Code FROM country WHERE Population<1000000)
SELECT Code from cte);   # Scope: "cte" is not visible to top SELECT

To avoid any trouble with the scope, the best way to use CTEs is by creating all of them at the beginning of the top query. This way, all CTEs can be used wherever you need, multiple times.

Chaining multiple CTEs

When creating multiple CTEs for a query, another interesting feature is the chaining . It is possible to define any CTE containing one or more references to previous CTEs in the chain.

The following example shows how the chaining can be used. We would like to find out the countries with the highest and lowest population density in the world. We create a chain of three CTEs. The last two contain a reference to the first one.

mysql> WITH density_by_country(country,density) AS 
       (SELECT Name, Population/SurfaceArea 
       FROM country 
       WHERE Population>0 and surfacearea>0), 
       max_density(country,maxdensity,label) AS 
       (SELECT country, density, 'max density' 
       FROM density_by_country 
       WHERE density=(SELECT MAX(density) FROM density_by_country)), 
       min_density(country,mindensity,label) AS 
       (SELECT country, density, 'min density' 
       FROM density_by_country 
       WHERE density=(SELECT MIN(density) FROM density_by_country)) 
       SELECT * FROM max_density UNION ALL SELECT * FROM min_density;
+-----------+------------+-------------+
| country   | maxdensity | label       |
+-----------+------------+-------------+
| Macao     | 26277.7778 | max density |
| Greenland | 0.0259     | min density |
+-----------+------------+-------------+

Let’s think now how you can rewrite the same query using derived tables instead. You would need to copy (several times) the definition of density_by_country . The final query would be really very large and probably less readable.

Use CTE Instead of VIEW

It could happen that your database user doesn’t have the right to create a VIEW. A CTE doesn’t require specific grants apart from the capability to read table and columns, the same for any regular SELECT.

Then you can use CTE instead of a VIEW. Apart from the grants options, a CTE has generally better performance than a VIEW, as we’ll show later.

Let’s create a view and run a query using it.

mysql> CREATE VIEW city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode);
 
mysql> SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+
 
# here is the explain of the query using the VIEW
mysql> EXPLAIN SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived3> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
|  2 | SUBQUERY    | <derived4> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  4 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

Let’s try now to rewrite the same query using the CTE.

mysql> WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
       SELECT name, city_pop_by_country.sum_population/country.population ratio 
       FROM country, city_pop_by_country 
       WHERE country.code=city_pop_by_country.countrycode 
         AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+--------------------+--------+
| name               | ratio  |
+--------------------+--------+
| Bangladesh         | 0.0664 |
| Brazil             | 0.5048 |
| China              | 0.1377 |
| Germany            | 0.3194 |
| Egypt              | 0.2933 |
| Ethiopia           | 0.0510 |
| Indonesia          | 0.1767 |
| India              | 0.1216 |
| Iran               | 0.3845 |
| Japan              | 0.6153 |
| Mexico             | 0.6043 |
| Nigeria            | 0.1557 |
| Pakistan           | 0.2016 |
| Philippines        | 0.4072 |
| Russian Federation | 0.4706 |
| Turkey             | 0.4254 |
| United States      | 0.2825 |
| Vietnam            | 0.1173 |
+--------------------+--------+
 
# here is the EXPLAIN of the query using CTE
mysql> EXPLAIN WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city GROUP BY countrycode)
SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, city_pop_by_country WHERE country.code=city_pop_by_country.countrycode AND country.population > (SELECT 10*AVG(sum_population) FROM city_pop_by_country);
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY     | country    | NULL       | ALL   | PRIMARY       | NULL        | NULL    | NULL               |  239 |    33.33 | Using where |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 12      | world.country.Code |   16 |   100.00 | NULL        |
|  3 | SUBQUERY    | <derived2> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL               | 4046 |   100.00 | NULL        |
|  2 | DERIVED     | city       | NULL       | index | CountryCode   | CountryCode | 12      | NULL               | 4046 |   100.00 | NULL        |
+----+-------------+------------+------------+-------+---------------+-------------+---------+--------------------+------+----------+-------------+

Taking a look at the two execution plans, we can notice that with the query, there are DERIVED stages. The materialization of the VIEW is needed more times, anytime the view is referenced.

The sample database is small, but we can enable the profiling to compare the execution time of the two queries.

mysql> SET profiling=1;
 
# execute the queries several times
 
mysql> show profiles;
+----------+------------+-----------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                         |
+----------+------------+-----------------------------------------------------------------------------------------------+
...
...                                                                                                                                                                                                                                                                                             |
|       35 | 0.00971925 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       36 | 0.00963100 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       37 | 0.00976900 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       38 | 0.00963875 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       39 | 0.00971200 | SELECT name, city_pop_by_country.sum_population/country.population ratio FROM country, ...    |
|       40 | 0.00546550 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       41 | 0.00546975 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       42 | 0.00550325 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       43 | 0.00548000 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
|       44 | 0.00545675 | WITH city_pop_by_country AS (SELECT countrycode, SUM(population) sum_population FROM city ... |
+----------+------------+-----------------------------------------------------------------------------------------------+

The execution time of the query with the view is around 0.0097 seconds, while with CTE, it is around 0.0054 seconds. So, the CTE is faster. Using larger tables and having more references of the view means the difference between the queries will be more relevant.

Using CTE instead of a VIEW is more efficient because only a single materialization is needed, and the temporary table created can be referenced many times in the main query.

Conclusion

We have introduced the new Common Table Expression feature available on MySQL 8.0. Using CTE, you can simplify, in most cases, the readability of the queries, but you can also use CTE instead of VIEWs to improve the overall performance.

Using CTE, it is also possible to create a recursive query. In the next article of this series, we’ll see examples about how to use recursive CTE to generate series or to query hierarchies.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构与算法分析(Java版)(英文原版)

数据结构与算法分析(Java版)(英文原版)

(美)Clifford A.Shaffer / 电子工业出版社 / 2002-5 / 39.00元

《数据结构与算法分析(C++版)(第2版)》采用程序员最爱用的面向对象C++语言来描述数据结构和算法,并把数据结构原理和算法分析技术有机地结合在一起,系统介绍了各种类型的数据结构和排序、检索的各种方法。作者非常注意对每一种数据结构的不同存储方法及有关算法进行分析比较。书中还引入了一些比较高级的数据结构与先进的算法分析技术,并介绍了可计算性理论的一般知识。本版的重要改进在于引入了参数化的模板,从而提......一起来看看 《数据结构与算法分析(Java版)(英文原版)》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换