文档课题:oracle直方图——FREQUENCY&HEIGHT BALANCED.
1、理论知识
当执行SQL语句时CBO如何选择执行计划?此时就不得不提及数据库统计信息,其帮助CBO做出正确判断,而统计信息包括多方面内容,接下来重点讨论直方图,因为通过直方图CBO可知一个列中不同值所占总数的比例.
Oracle的直方图有两种:
频率直方图(FREQUENCY HISTOGRAM):当列中Distinct_keys较少(小于254),若不手工指定直方图桶数(BUCKET),Oracle会自动创建频率直方图,并且桶数(BUCKET)等于Distinct_Keys.
高度平衡直方图(HEIGHT BALANCED):当列中Distinct_keys大于254,若不手工指定直方图桶数(BUCKET),Oracle就会自动创建高度平衡直方图.
2、实验测试
测试数据库:oracle 11.2.0.4
2.1、建测试表及收集统计信息
--建测试表并收集所有列直方图的统计信息.
LEO@orcl> create table test as select * from dba_objects;
Table created.
SYS@orcl> begin
dbms_stats.gather_table_stats('LEO',
'TEST',
degree => 1,
estimate_percent => 100,
method_opt => 'for all columns size skewonly',
no_invalidate => false,
cascade => true);
9 end;
10 /
PL/SQL procedure successfully completed.
--查统计信息搜集情况
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'LEO'
and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
--------------- ---------- ----------- ----------- --------------- -----------
OWNER 91175 24 .03 FREQUENCY 24
OBJECT_NAME 91175 54521 59.8 HEIGHT BALANCED 254
SUBOBJECT_NAME 91175 105 .12 FREQUENCY 105
OBJECT_ID 91175 91175 100 NONE 1
DATA_OBJECT_ID 91175 8622 9.46 HEIGHT BALANCED 254
OBJECT_TYPE 91175 44 .05 FREQUENCY 44
CREATED 91175 1061 1.16 HEIGHT BALANCED 254
LAST_DDL_TIME 91175 1251 1.37 HEIGHT BALANCED 254
TIMESTAMP 91175 1224 1.34 HEIGHT BALANCED 254
STATUS 91175 1 0 FREQUENCY 1
TEMPORARY 91175 2 0 FREQUENCY 2
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
--------------- ---------- ----------- ----------- --------------- -----------
GENERATED 91175 2 0 FREQUENCY 2
SECONDARY 91175 2 0 FREQUENCY 2
NAMESPACE 91175 20 .02 FREQUENCY 20
EDITION_NAME 91175 0 0 NONE 0
15 rows selected.
说明:如上查询可知owner行的distinct_keys为24,,小于254则默认的桶数为24,属于频率直方图.而object_name行的distinct_keys为54521,大于254则默认的桶数为254,属于高度平衡直方图.
2.2、相关测试
2.2.1、有直方图的情况
2.2.1.1、列中重复值高的情况
--测试直方图作用,先在owner列创建索引
LEO@orcl> create index idx_test_owner on test(owner);
Index created.
--查看SQL语句执行计划
LEO@orcl> set autotrace traceonly
LEO@orcl> select * from test where owner='SYS';
40208 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40208 | 3848K| 364 (1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| TEST | 40208 | 3848K| 364 (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3956 consistent gets
0 physical reads
0 redo size
2044704 bytes sent via SQL*Net to client
29999 bytes received via SQL*Net from client
2682 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40208 rows processed
注意:owner列虽然存在索引,但此处为什么不走索引呢?因为表test共有91175条数据,而owner='SYS'有40208条数据,若此时走索引首先要在索引上找到对应的rowid,然后再通过rowid回表,因此消耗的资源反而更大.此处全表扫描的consistent gets为3956,再看执行计划rows列id=1 CBO预估返回40208行,而最终结果确实也是返回40208行.
2.2.1.2、列中重复值低的情况
--以下测试列中重复值低时存在索引的情况
LEO@orcl> select * from test where owner='SCOTT';
8 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 510444911
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 784 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 8 | 784 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_OWNER | 8 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6 consistent gets
1 physical reads
0 redo size
2273 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
8 rows processed
注意:执行select * from test where owner= 'SCOTT'语句时CBO选择走索引,因为owner='SCOTT'只有8行,此时走索引只需拿到8行rowid然后回表,比起全表扫描大大节省资源,提高查询速度.
2.2.2、无直方图的情况
2.2.2.1、删除直方图
--为进一步说明直方图作用,现将其删除.
begin
dbms_stats.gather_table_stats('LEO',
'TEST',
estimate_percent => 100,
method_opt => 'for all columns size 1',
no_invalidate => false,
degree => 1,
cascade => true);
end;
/
--查看直方图信息.
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'LEO'
and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER 91175 24 .03 NONE 1
OBJECT_NAME 91175 54521 59.8 NONE 1
SUBOBJECT_NAME 91175 105 .12 NONE 1
OBJECT_ID 91175 91175 100 NONE 1
DATA_OBJECT_ID 91175 8622 9.46 NONE 1
OBJECT_TYPE 91175 44 .05 NONE 1
CREATED 91175 1061 1.16 NONE 1
LAST_DDL_TIME 91175 1251 1.37 NONE 1
TIMESTAMP 91175 1224 1.34 NONE 1
STATUS 91175 1 0 NONE 1
TEMPORARY 91175 2 0 NONE 1
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
GENERATED 91175 2 0 NONE 1
SECONDARY 91175 2 0 NONE 1
NAMESPACE 91175 20 .02 NONE 1
EDITION_NAME 91175 0 0 NONE 0
15 rows selected.
注意:HISTOGRAM为'NONE',直方图均被删除.
2.2.2.2、列中重复值高的情况
LEO@orcl> select * from test where owner='SYS';
40208 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 510444911
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3799 | 363K| 111 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 3799 | 363K| 111 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_OWNER | 3799 | | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6436 consistent gets
0 physical reads
0 redo size
4606034 bytes sent via SQL*Net to client
29999 bytes received via SQL*Net from client
2682 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
40208 rows processed
注意:删除直方图后再来执行select * from test where owner='SYS',发现此时CBO选择走索引,原本走全表扫描更优.为什么此时选择走索引呢?执行计划显示rows列是3799,CBO认为只需返回3799行数据,从91175中选择3799行数据,CBO必然认为走索引优于走全表扫描.可以看到现在的consistent gets为6436,大于之前走全表扫描的3956.
2.2.2.3、列中重复值低的情况
LEO@orcl> select * from test where owner='SCOTT';
8 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 510444911
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3799 | 363K| 111 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 3799 | 363K| 111 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_OWNER | 3799 | | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
2273 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
8 rows processed
说明:删除直方图后select * from test where owner='SCOTT'依然走索引,rows列依然是3799行,最终真正返回的是8行,原来在oracle数据库中,CBO会默认目标列的数据在最小值LOW_VALUE和最大值HIGH_VALUE之间是均匀分布的,并且会按照均匀分布原则来计算对目标列施加查询条件后的可选择率以及结果集的cardinality,进而来计算成本值并选择执行计划.
3、绑定变量窥探
收集直方图可能会导致绑定变量窥探(该问题在9i、10g几乎没什么好办法解决,从11g引进自适应游标,解决绑定变量窥探)
说明:如下测试数据库版本为ORACLE 10.2.0.1.0,主要进行绑定变量的相关测试.
3.1、建测试表及搜集统计信息
LEO@orcl> create table test as select * from dba_objects;
Table created.
SYS@orcl> exec dbms_stats.gather_table_stats(ownname => 'LEO',tabname => 'TEST',method_opt => 'for all columns size skewonly');
PL/SQL procedure successfully completed.
--查询统计信息及直方图
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'LEO'
and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER 49745 15 .03 FREQUENCY 15
OBJECT_NAME 49745 29770 59.85 HEIGHT BALANCED 254
SUBOBJECT_NAME 49745 46 .09 FREQUENCY 46
OBJECT_ID 49745 49745 100 NONE 1
DATA_OBJECT_ID 49745 3620 7.28 HEIGHT BALANCED 254
OBJECT_TYPE 49745 29 .06 FREQUENCY 29
CREATED 49745 1022 2.05 HEIGHT BALANCED 254
LAST_DDL_TIME 49745 1048 2.11 HEIGHT BALANCED 254
TIMESTAMP 49745 1033 2.08 HEIGHT BALANCED 254
STATUS 49745 1 0 FREQUENCY 1
TEMPORARY 49745 2 0 FREQUENCY 2
GENERATED 49745 2 0 FREQUENCY 2
SECONDARY 49745 1 0 FREQUENCY 1
13 rows selected.
--建索引
LEO@orcl> create index idx_test_owner on test(owner);
Index created.
3.2、无绑定变量情况
LEO@orcl> set autotrace traceonly
LEO@orcl> select * from test where owner='SYS';
22902 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23093 | 2097K| 157 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| TEST | 23093 | 2097K| 157 (2)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2194 consistent gets
0 physical reads
0 redo size
1148025 bytes sent via SQL*Net to client
17255 bytes received via SQL*Net from client
1528 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
22902 rows processed
说明:此时执行计划为全表扫描.
3.3、有绑定变量情况
--如下测试使用绑定变量的场景
LEO@orcl> variable v_name varchar2(10);
LEO@orcl> exec :v_name :='SCOTT';
PL/SQL procedure successfully completed.
LEO@orcl> select * from test where owner=:v_name;
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 510444911
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3316 | 301K| 95 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 3316 | 301K| 95 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_OWNER | 3316 | | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:V_NAME)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1870 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
--现在将绑定变量改为SYS
LEO@orcl> exec :v_name:='SYS';
PL/SQL procedure successfully completed.
LEO@orcl> select * from test where owner=:v_name;
22902 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 510444911
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3316 | 301K| 95 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 3316 | 301K| 95 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_OWNER | 3316 | | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:V_NAME)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3622 consistent gets
45 physical reads
0 redo size
2525237 bytes sent via SQL*Net to client
17255 bytes received via SQL*Net from client
1528 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
22902 rows processed
说明:如上所示使用绑定变量后当选择SYS用户时,即使有收集直方图,CBO也没有选择走全表扫描而是走索引,执行计划明显不合理.
4、直方图搜集准则
收集直方图并非一定会带来好处,注意以下两点:
A、如果目标列的数据是均匀分布的,比如主键列,有唯一索引的列,则不需要对此类列收集直方图的统计信息;
B、对于从来没有在where条件中出现的列,不管其数据分布是否均匀,都无须对其收集直方图统计信息.
参考文档:https://blog.csdn.net/cuiwangxie1183/article/details/100483109?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168932555516800180625844%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168932555516800180625844&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-2-100483109-null-null.268^v1^koosearch&utm_term=%E7%9B%B4%E6%96%B9%E5%9B%BE&spm=1018.2226.3001.4450