MySQL update where+子查询案例分析

阅读 53

2022-07-27

今天接到开发同学需求,需要修改一批数据,SQL如下:

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select b.id from settle_studio_income a,finance_income_flow b  where 
 a.source_code = b.business_no 
 and a.studio_code = b.merchant_id
 and a.income_time = b.income_time
 and b.re_item_code= 'SZ2024'
 and a.`source` = 4 and a.income_time = '2022-04-30 23:59:59' and b.income_time  = '2022-04-30 23:59:59'  and a.`status` =400
 and b.settle_status = 200);

执行时候发现报错:

You can't specify target table 'finance_income_flow' for update in FROM clause

这个报错意思是要更新的表不能在where条件中用于子查询。
换一种方式修改,创建个临时表,然后把子查询的id全部insert进去,然后更新:

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);

这个SQL只需要更新49310条数据,但是SQL执行了两分钟一直没有成功。
show processlist查看线程状态为preparing。

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
影响行数:[49310],耗时:201896 ms

几分钟后执行成功了,但是耗时三分半,明显不正常,接下来我们查找一下原因。

首先看一下这个SQL的执行计划。

explain update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| id           | select_type           | table               | partitions           | type           | possible_keys           | key           | key_len           | ref           | rows           | filtered           | Extra           |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| 1            | UPDATE                | finance_income_flow |                      | index          |                         | PRIMARY       | 4                 |               | 31431906       |                100 | Using where     |
| 2            | DEPENDENT SUBQUERY    | t_bid_tmp           |                      | index_subquery | idx_idd                 | idx_idd       | 5                 | func          | 1              |                100 | Using index     |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+

通过执行计划,我们看到三个关键信息:
1、表finance_income_flow扫描行数为31431906,全索引扫描;
2、表t_bid_tmp优先级比finance_income_flow高;
3、表t_bid_tmp的select_type为DEPENDENT SUBQUERY,意思是SUBQUERY要受到外部查询的影响。
 
这个就比较尴尬了,子查询受到外部查询的影响,但是外部查询是全索引扫描(扫描行数31431906),那这个SQL一定会很慢。
看执行计划,这个执行效果没有疑问,可是好像和正常理解的有偏差,难道不是应该先执行子查询,然后外部查询根据子查询结果进行update?

带着疑问,我们再看一个执行计划:

explain select count(*) from finance_income_flow where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| id           | select_type           | table               | partitions           | type           | possible_keys           | key           | key_len           | ref               | rows           | filtered           | Extra                               |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| 1            | SIMPLE                | t_bid_tmp           |                      | index          | idx_idd                 | idx_idd       | 5                 |                   | 49664          |               98.8 | Using where; Using index; LooseScan |
| 1            | SIMPLE                | finance_income_flow |                      | eq_ref         | PRIMARY                 | PRIMARY       | 4                 | hlj.t_bid_tmp.idd | 1              |                100 | Using index                         |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+

通过这个执行计划,我们看到三点信息:
1、表t_bid_tmp优先级比finance_income_flow高
2、两个表select_type都是SIMPLE
3、表finance_income_flow扫码行数为49664,type为eq_ref,效率很高。
 
通过对比两个执行计划,两个SQL在处理select和update时候逻辑完全不一样,效率也差别很大。
通过各方查找相关信息:MySQL update where+子查询方式更新,是将外部查询数据全部取出来,然后进行更新,这种原理决定了如果外部查询表特表大,那整体执行时间就会特别长。

对于类似这样SQL,有没有好的解决方法呢?当然有。

update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
影响行数:[49310],耗时:1377 ms.
explain update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| id           | select_type           | table           | partitions           | type           | possible_keys           | key           | key_len           | ref           | rows           | filtered           | Extra                    |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| 1            | SIMPLE                | b               |                      | index          | idx_idd                 | idx_idd       | 5                 |               | 49664          |                100 | Using where; Using index |
| 1            | UPDATE                | a               |                      | eq_ref         | PRIMARY                 | PRIMARY       | 4                 | hlj.b.idd     | 1              |                100 |                          |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+

通过执行计划可以看到:这种更新方式效率最高,也最符合预期。

精彩评论(0)

0 0 举报