CxSAST Database Maintenance Guide
Chapter 1 - Introduction
The purpose of the document to provide specific information about Checkmarx SAST (CxSAST) tables regarding their maintenance. It doesn't replace MS SQL Server guidelines and best practices published by official database providers. It refers to sole aspects (key area) of database maintenance: Index and Tables fragmentation.
There are basically two types of fragmentation:
Fragmentation within individual data and index pages (sometimes called internal fragmentation)
Fragmentation within index or table structures consisting of pages (called logical scan fragmentation and extent scan fragmentation)
More commonly, internal fragmentation results from data modifications, such as inserts, updates, and deletes, which can leave empty space on a page. Depending on the table/index schema and the application's characteristics, this empty space may never be reused once it is created and can lead to ever-increasing amounts of unusable space in the database. Wasted space on data/index pages can therefore lead to needing more pages to hold the same amount of data. Not only does this take up more disk space, it also means that a query needs to issue more I/Os to read the same amount of data. All these extra pages occupy additional space in the data cache, therefore taking up more server memory.
Logical scan (or external/extent) fragmentation is caused by an operation called a page split. This occurs when a record has to be inserted on a specific index page (according to the index key definition) but there is not enough space on the page to fit the data being inserted. The page is split in half and roughly 50% of the records moved to a newly allocated page. This new page is usually not physically contiguous with the old page and therefore is refered to as fragmented. Extent scan fragmentation is similar in concept. Fragmentation within the table/index structures affects the ability of the SQL Server to do efficient scans, whether over an entire table/index or bounded by a query WHERE clause (range scan).
For additional information, refer to https://technet.microsoft.com/en-us/library/2008.08.database.aspx.
Chapter 2 - Checkmarx Tables Overview
The CxSAST application has two databases:
CxActivity – contains tables serving auditing persistancy
CxDB – primary database serving ongoing usage
CxSAST inserts data in CxActivity tables without deleting or updating them in the future. Therefore, the risk of fragmentation and as result performance degradation is low.
CxDB database has tables for various functionalities working in different ways. From now, the discussion will be related to the tables dynamic having relatively massive data. These tables are divided to three categories:
Tables List | Description/Purpose | |
---|---|---|
1 | dbo.PathResults, dbo.NodeResults, dbo.ResultsLabels, dbo.ResultsLabelsHistory, dbo.Auxiliary_* | Ongoing growing tables having purging policy as default application behavior |
2 | CxBi.*, dbo.QueryVersion, dbo.ScanRequests, dbo.ScanStatistics, dbo.TaskScans, dbo.LoggedinUser | They serve for analyzing/calculation with removing data at the end of processing |
3 | dbo.Libraries, dbo.ScannedLibraries, dbo.ScannedVulnerabilities, dbo.Scans, dbo.Vulnerabilities | Ongoing growing tables |
Tables from the two first categories have high risk of fragmentation.
Chapter 3 - Monitoring
Instead of rebuilding or reorganizing all indexes on a regular basis (e.g., daily/weekly/monthly) the more sophisticated approach involves using the dynamic management function (DMF) sys.dm_db_index_physical_stats to periodically determine which indexes are fragmented, and then choosing whether and how to operate on those. This function accepts parameters such as the database, database table, and index for which you want to find fragmentation. An example of the function usage is as follows:
SELECT
OBJECT_NAME(ips.object_id) "TblName"
,ips.object_id
,ips.index_id
,(select i.name from sys.indexes i where ips.object_id = i.object_id AND ips.index_id = i.index_id and ips.index_level = 0) "IndexName"
,ips.index_type_desc "IndexType"
,ips.avg_fragmentation_in_percent
,ips.fragment_count
,ips.avg_fragment_size_in_pages
,ips.forwarded_record_count
,ips.alloc_unit_type_desc
,ips.page_count
,ips.index_depth
,ips.avg_page_space_used_in_percent
,ips.record_count
,ips.ghost_record_count
,ips.version_ghost_record_count
,ips.min_record_size_in_bytes
,ips.max_record_size_in_bytes
,ips.avg_record_size_in_bytes
,ips.compressed_page_count
FROM sys.dm_db_index_physical_stats(DB_ID('CxDB'),NULL,NULL,NULL,'<Scanning
Mode>') AS ips WHERE (1=1)
and index_level=0
ORDER BY OBJECT_NAME(ips.object_id),ips.index_id;
Scanning Mode - the mode in which the function is executed determines the level of scanning performed to obtain the statistical data that is used by the function. Mode is specified as
LIMITED - fastest mode and scans the smallest number of pages (min info)
SAMPLED - returns statistics based on a 1% sample of all the pages in the index or heap. If the index or heap has fewer than 10,000 pages, DETAILED mode is used instead of SAMPLED.
DETAILED – heaviest mode and scans all pages and returns all statistics (max info)
The default (NULL) is LIMITED.
For more details see https://msdn.microsoft.com/en-us/library/ms188917(v=sql.110).
Returns size and fragmentation information for the data and indexes of the specified table or view. For an index, one row is returned for each level of the B-tree in each partition. For a heap, one row is returned for the IN_ROW_DATA allocation unit of each partition. For large object (LOB) data, one row is returned for the LOB_DATA allocation unit of each partition. If row-overflow data exists in the table, one row is returned for the ROW_OVERFLOW_DATA allocation unit in each partition.
Along with other information, the following columns are most important for detecting fragmentation:
Returned Column | Description | |||
---|---|---|---|---|
avg_fragmentation_in_percent | This indicates the amount of external fragmentation you have for the given objects. The lower the number the better - as this number approaches 100% the more pages you have in the given index that are not properly ordered. | |||
For heaps, this value is actually the percentage of extent fragmentation and not external fragmentation. | ||||
avg_page_space_used_in_percent | This indicates how dense the pages in your index are, i.e., on average how full each page in the index is (internal fragmentation). The higher the number the better speaking in terms of fragmentation and read-performance. To achieve optimal disk space use, this value should be close to 100% for an index that will not have many random inserts. However, an index that has many random inserts and has very full pages will have an increased number of page splits. This causes more fragmentation. Therefore, in order to reduce page splits, the value should be less than 100%. | |||
fragment_count | A fragment is made up of physically consecutive leaf pages in the same file for an allocation unit. An index has at least one fragment. The maximum fragments an index can have are equal to the number of pages in the leaf level of the index. So the less fragments the more data is stored consecutively. | |||
avg_fragment_size_in_pages | Larger fragments mean that less disk I/O is required to read the same number of pages. Therefore, the larger the avg_fragment_size_in_pages value, the better the range scan performance. | |||
forwarded_record_count | Number of records in a heap that have forward pointers to another data location. (This state occurs during an update, when there is not enough room to store the new row in the original location.) NULL for any allocation unit other than the IN_ROW_DATA allocation units for a heap. NULL for heaps when mode = LIMITED. |
Chapter 4 - Maintenance Options for Reducing Fragmentation
Decision which defragmentation method to use should be based on the degree of fragmentation and table type (as result of running sys.dm_db_index_physical_stats, see the previous chapter). There are two main methods:
Method | When | Comments |
---|---|---|
ALTER INDEX REORGANIZE | > 10% and < = 30% | Reorganizing an index is always executed online and uses minimal system resources. It defragments the leaf level of clustered and non-clustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right order of the leaf nodes. Reorganizing also compacts the index pages. Reorganizing a specified clustered index compacts all LOB columns that are contained in the clustered index. Reorganizing a non-clustered index compacts all LOB columns that are non-key (included) columns in the index. Reorganize does NOT update statistics, this should be run manually. Single threaded only – regardless of edition |
ALTER INDEX REBUILD WITH (ONLINE = ON) | > 30% | Rebuilding an index can be executed online or offline. To achieve availability similar to the reorganize option, you should rebuild indexes online. The ONLINE option and parallelism are available for Enterprise Edition only! When performed offline, the entire table is unavailable for the duration of the operation. Defragments all levels of the index and update statistics. |
Warning
There are other methods (e.g., drop and recreate cluster index), but are more complicated and less recommended.
Fragmentation alone is not a sufficient reason to reorganize or rebuild an index. The main effect of fragmentation is that it slows down page read-ahead output during index scans. This causes slower response times. If the query workload on a fragmented table or index does not involve scans, because the workload is primarily singleton lookups, removing fragmentation may have no effect.
These values (in When column compared with avg_fragmentation_in_percent) provide a rough guideline for determining the point at which you should switch between ALTER INDEX REORGANIZE and ALTER INDEX REBUILD. However, the actual values may vary from case to case. It is important that you experiment to determine the best threshold for your environment. Very low levels of fragmentation (less than 5%) should not be addressed by either of these commands because the benefit from removing such a small amount of fragmentation is almost always vastly outweighed by the cost of reorganizing or rebuilding the index. The decision should be take into consideration SQL Server Edition.
In general, fragmentation on small indexes is often not controllable. The pages of small indexes are stored on mixed extents. Mixed extents are shared by up to eight objects, so the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index.