内容简介:有时候我们会遇到,由于统计信息不准确导致优化器生成了一个错误的执行计划(或者这样表达:一个较差的执行计划),从而引起了系统性能问题。那么如果我们怀疑这个错误的执行计划是由于统计信息不准确引起的。那么我们如何判断统计信息不准确呢?当然首先得去查看实际执行计划中,统计信息的相关数据是否与实际情况有较大的出入,下面我们抛开这个大命题,仅仅从统计信息层面去查看统计信息的更新时间,统计信息的采样行数等情况。1:首先,我们要查查统计信息是什么时候更新的。
SQL Server 查找统计信息的相关采样信息
有时候我们会遇到,由于统计信息不准确导致优化器生成了一个错误的执行计划(或者这样表达:一个较差的执行计划),从而引起了系统性能问题。那么如果我们怀疑这个错误的执行计划是由于统计信息不准确引起的。那么我们如何判断统计信息不准确呢?当然首先得去查看实际执行计划中,统计信息的相关数据是否与实际情况有较大的出入,下面我们抛开这个大命题,仅仅从统计信息层面去查看统计信息的更新时间,统计信息的采样行数等情况。
1:首先,我们要查查统计信息是什么时候更新的。
2:其次,我们查看统计信息的采样信息:采样选取的行数、 自上次更新统计信息以来前导统计信息列(构建直方图的列)的总修改次数。。。
查看统计信息的最后更新时间。
方法1:
--查看统计信息的更新时间 DECLARE @TableName NVARCHAR(128); SET @TableName = 'dbo.pbCutClothCost'; SELECT @TableName AS Table_Name, name AS Stats_Name , STATS_DATE(object_id, stats_id) AS Last_Stats_Update FROM sys.stats WHERE object_id = OBJECT_ID(@TableName) ORDER BY 2 DESC;
如上所示,我们通过这个脚本查看某个表,所有的统计信息最后一次更新时间。如果你需要查看某个具体的统计信息的最后更新时间,那么在这个 SQL 的基础上增加相关查询条件即可。
方法2:
-- 查看统计信息的更新时间
EXEC sp_autostats 'dbo.pbCutClothCost' ;
方法3:
还有一种方法可以通过 sys.dm_db_stats_properties 返回统计信息的更新时间,不过这个DMF只有SQL Server 2008 R2 SP2这个版本之后的才有。
| 列名 |
数据类型 |
Description |
| object_id |
int |
要返回统计信息对象属性的对象(表或索引视图)的 ID 。 |
| stats_id |
int |
统计信息对象的 ID 。 在表或索引视图中是唯一的。 有关详细信息,请参阅 sys.stats (Transact-SQL) 。 |
| last_updated |
datetime2 |
上次更新统计信息对象的日期和时间。 有关详细信息,请参阅此页中的 备注 部分。 |
| rows |
bigint |
上次更新统计信息时表或索引视图中的总行数。 如果筛选统计信息或者统计信息与筛选索引对应,该行数可能小于表中的行数。 |
| rows_sampled |
bigint |
用于统计信息计算的抽样总行数。 |
| Step |
int |
直方图中的值范围数(步长)( Number of steps in the histogram )。 有关详细信息,请参阅 DBCC SHOW_STATISTICS (Transact-SQL) 。 |
| unfiltered_rows |
bigint |
应用筛选表达式(用于筛选的统计信息)之前表中的总行数。 如果未筛选统计信息,则 unfiltered_rows 等于行列中返回的值。 |
| modification_counter |
bigint |
自上次更新统计信息以来前导统计信息列(构建直方图的列)的总修改次数。 |
| persisted_sample_percent |
float |
持久样本百分比用于未显式指定采样百分比的统计信息更新。 如果值为零,则不为此统计信息设置持久样本百分比。 |
查看采样的相关信息
SELECT sch.name + '.' + so.name AS table_name , so.object_id , ss.name AS stat_name , ds.stats_id , ds.last_updated , ds.rows , ds.rows_sampled , ds.steps , ds.unfiltered_rows , ds.modification_counter FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id CROSS APPLY sys.dm_db_stats_properties(ss.object_id,ss.stats_id) ds WHERE so.name = N'pbCutClothCost' AND LEFT(ss.name, 4) != '_WA_';
如上截图,索引IX_CutClothCost的统计信息有更新,是因为在执行上面脚本,我更新了这个索引的统计信息。通过rows与实际记录数对比、 modification_counter信息,我们从而有个大概的判断,这些统计 信息是否过时。是否采样的比例太小等。
SELECT sch.name + '.' + so.name AS table_name , so.object_id , ss.name AS stat_name , ds.stats_id , ds.last_updated , ds.rows , ds.rows_sampled , ds.rows_sampled/ds.rows *100 AS sample_rate , ds.steps , ds.unfiltered_rows , ds.modification_counter FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id CROSS APPLY sys.dm_db_stats_properties(ss.object_id,ss.stats_id) ds WHERE so.name = N'pbCutClothCost' AND LEFT(ss.name, 4) != '_WA_';
当然也可以使用DBCC SHOW_STATISTICS来查看统计信息的详细信息。
DBCC SHOW_STATISTICS ( 'dbo.pbCutClothCost' , IX_pbCutClothCost_N1 )
另外一个维度来判别统计信息过时,是通过时间维度(最后一次统计信息更新距今的时间)。这个相关上面维度的数据而言,往往不是特别准确,但是也有参考意义。
SELECT sch.name + '.' + so.name AS "Table", ss.name AS"Statistic", CASE WHEN ss.auto_Created = 0 AND ss.user_created = 0 THEN 'Index Statistic' WHEN ss.auto_created = 0 AND ss.user_created = 1 THEN 'USER Created' WHEN ss.auto_created = 1 AND ss.user_created = 0 THEN 'Auto Created' WHEN ss.AUTO_created = 1 AND ss.user_created = 1 THEN 'Not Possible' END AS "Statistic Type", CASE WHEN ss.has_filter = 1 THEN 'Filtered INDEX' WHEN ss.has_filter = 0 THEN 'No Filter' END AS "Filtered", CASE WHEN ss.filter_definition IS NULL THEN '' WHEN ss.filter_definition IS NOT NULL THEN ss.filter_definition END AS "Filter Definition", sp.last_updated AS "Stats Last Updated", sp.rows AS "Rows", sp.rows_sampled AS "Rows Sampled", sp.unfiltered_rows AS "Unfiltered Rows", sp.modification_counter AS "Row Modifications", sp.steps AS "Histogram Steps" FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id OUTER APPLY sys.dm_db_stats_properties(so.object_id, ss.stats_id) AS sp WHERE so.TYPE = 'U' AND sp.last_updated < GETDATE() - 7 ORDER BY sp.last_updated DESC;
以前收集过一个查询过时的统计信息的脚本(忘记出自哪里了),这个是通过 Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 ) 来找出过时的统计信息。如下所示
DECLARE @Major INT ,
@Minor INT ,
@build INT ,
@revision INT ,
@i INT ,
@str NVARCHAR(100) ,
@str2 NVARCHAR(10);
SET @str = CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(100));
SET @str2 = LEFT(@str, CHARINDEX('.', @str));
SET @i = LEN(@str);
SET @str = RIGHT(@str, @i - CHARINDEX('.', @str));
SET @Major = CAST(REPLACE(@str2, '.', '') AS INT);
SET @str2 = LEFT(@str, CHARINDEX('.', @str));
SET @i = LEN(@str);
SET @str = RIGHT(@str, @i - CHARINDEX('.', @str));
SET @Minor = CAST(REPLACE(@str2, '.', '') AS INT);
SET @str2 = LEFT(@str, CHARINDEX('.', @str));
SET @i = LEN(@str);
SET @str = RIGHT(@str, @i - CHARINDEX('.', @str));
SET @build = CAST(REPLACE(@str2, '.', '') AS INT);
SET @revision = CAST(@str AS INT);
IF @Major < 10
SET @i = 1;
ELSE
IF @Major > 10
SET @i = 0;
ELSE
IF @Minor = 50
AND @build >= 4000
SET @i = 0;
ELSE
SET @i = 1;
IF @i = 1
BEGIN
EXEC sp_executesql N';WITH StatTables AS(
SELECT so.schema_id AS ''schema_id'',
so.name AS ''TableName'',
so.object_id AS ''object_id'',
CASE indexproperty(so.object_id, dmv.name, ''IsStatistics'')
WHEN 0 THEN dmv.rows
ELSE (SELECT TOP 1 row_count FROM sys.dm_db_partition_stats ps (NOLOCK) WHERE ps.object_id=so.object_id AND ps.index_id in (1,0))
END AS ''ApproximateRows'',
dmv.rowmodctr AS ''RowModCtr''
FROM sys.objects so (NOLOCK)
INNER JOIN sysindexes dmv (NOLOCK) ON so.object_id = dmv.id
LEFT JOIN sys.indexes si (NOLOCK) ON so.object_id = si.object_id AND so.type in (''U'',''V'') AND si.index_id = dmv.indid
WHERE so.is_ms_shipped = 0
AND dmv.indid<>0
AND so.object_id not in (SELECT major_id FROM sys.extended_properties (NOLOCK) WHERE name = N''microsoft_database_tools_support'')
),
StatTableGrouped AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY TableName) AS seq1,
ROW_NUMBER() OVER(ORDER BY TableName DESC) AS seq2,
TableName,
cast(Max(ApproximateRows) AS bigint) AS ApproximateRows,
cast(Max(RowModCtr) AS bigint) AS RowModCtr,
schema_id,object_id
FROM StatTables st
GROUP BY schema_id,object_id,TableName
HAVING (Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 ))
)
SELECT
@@SERVERNAME AS InstanceName,
seq1 + seq2 - 1 AS NbOccurences,
SCHEMA_NAME(stg.schema_id) AS ''SchemaName'',
stg.TableName,
CASE OBJECTPROPERTY(stg.object_id, ''TableHasClustIndex'')
WHEN 1 THEN ''Clustered''
WHEN 0 THEN ''Heap''
ELSE ''Indexed View''
END AS ClusteredHeap,
CASE objectproperty(stg.object_id, ''TableHasClustIndex'')
WHEN 0 THEN (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) - 1
ELSE (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id)
END AS IndexCount,
(SELECT count(*) FROM sys.columns c (NOLOCK) WHERE c.object_id = stg.object_id ) AS ColumnCount ,
(SELECT count(*) FROM sys.stats s (NOLOCK) WHERE s.object_id = stg.object_id) AS StatCount ,
stg.ApproximateRows,
stg.RowModCtr,
stg.schema_id,
stg.object_id
FROM StatTableGrouped stg';
END;
ELSE
BEGIN
EXEC sp_executesql N';WITH StatTables AS(
SELECT so.schema_id AS ''schema_id'',
so.name AS ''TableName'',
so.object_id AS ''object_id''
, ISNULL(sp.rows,0) AS ''ApproximateRows''
, ISNULL(sp.modification_counter,0) AS ''RowModCtr''
FROM sys.objects so (NOLOCK)
JOIN sys.stats st (NOLOCK) ON so.object_id=st.object_id
CROSS APPLY sys.dm_db_stats_properties(so.object_id, st.stats_id) AS sp
WHERE so.is_ms_shipped = 0
AND st.stats_id<>0
AND so.object_id not in (SELECT major_id FROM sys.extended_properties (NOLOCK) WHERE name = N''microsoft_database_tools_support'')
),
StatTableGrouped AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY TableName) AS seq1,
ROW_NUMBER() OVER(ORDER BY TableName DESC) AS seq2,
TableName,
cast(Max(ApproximateRows) AS bigint) AS ApproximateRows,
cast(Max(RowModCtr) AS bigint) AS RowModCtr,
count(*) AS StatCount,
schema_id,object_id
FROM StatTables st
GROUP BY schema_id,object_id,TableName
HAVING (Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 ))
)
SELECT
@@SERVERNAME AS InstanceName,
seq1 + seq2 - 1 AS NbOccurences,
SCHEMA_NAME(stg.schema_id) AS ''SchemaName'',
stg.TableName,
CASE OBJECTPROPERTY(stg.object_id, ''TableHasClustIndex'')
WHEN 1 THEN ''Clustered''
WHEN 0 THEN ''Heap''
ELSE ''Indexed View''
END AS ClusteredHeap,
CASE objectproperty(stg.object_id, ''TableHasClustIndex'')
WHEN 0 THEN (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) - 1
ELSE (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id)
END AS IndexCount,
(SELECT count(*) FROM sys.columns c (NOLOCK) WHERE c.object_id = stg.object_id ) AS ColumnCount ,
stg.StatCount,
stg.ApproximateRows,
stg.RowModCtr,
stg.schema_id,
stg.object_id
FROM StatTableGrouped stg';
END;
参考资料:
https://www.sqlskills.com/blogs/erin/new-statistics-dmf-in-sql-server-2008r2-sp2/
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- CNN 真的需要下采样(上采样)吗?
- Hive的分桶和采样
- 【信号与系统】05 - 滤波、采样和通信
- 加权随机采样 (Weighted Random Sampling)
- 深度卷积神经网络中的降采样
- 能量视角下的GAN模型(二):GAN=“分析”+“采样”
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
R for Data Science
Hadley Wickham、Garrett Grolemund / O'Reilly Media / 2016-12-25 / USD 39.99
http://r4ds.had.co.nz/一起来看看 《R for Data Science》 这本书的介绍吧!