مانیتورینگ پایگاه داده SQL Server

یکی از مهمترین سیستم های پایگاه داده که امروزه در اکثر سازمان ها مورد استفاده قرار می گیرد SQL Server  می باشد. در کشور ما نیز اکثر سازمان ها و شرکت ها از SQL Server برای ذخیره سازی اطلاعات حساس و مهم خود استفاده می کنند.

لذا علاوه بر حفظ و نگهداری از آن، بازدهی و کارایی بالای آن نیز دارای اهمیت می باشد. Performance پایگاه داده و مانیتور کردن آن از مواردی است که برای مدیران IT سازمان ها مهم و تعیین کننده می باشد. اجرای صحیح Job ها، وجود ارتباط بین پایگاه داده و برنامه های کاربردی، پاسخ دهی به موقع و مناسب به درخواستها، همگی از مواردی است که بر کارکرد دیتابیس تاثیر گذار است.

 

سیستم جامع مانیتورینگ زبیکس با پوشش موارد فوق این امکان را فراهم می آورد تا در صورت ایجاد خلل یا کاهش بازدهی سیستم، مدیران را با خبر ساخته و علت وقوع آن را ردیابی و گزارش دهی کند.

 

 

برخی از ویژگی های مانیتورینگ  پایگاه داده SQL Server :

 

 

-میزان استفاده SQL Server از CPU : سروری که برای SQL انتخاب می شود باید علاوه بر پاسخ دهی به پردازش های  سیستم عامل که معمولا windows server است به سرویس SQL Server نیز پاسخ دهی مناسبی داشته باشد.

 

-میزان استفاده SQL Server  از مموری: میزان فضای موجود برای  اجرای SQL Server  از موارد تاثیر گذار بر بازدهی و کارایی دیتابیس می باشد. در صورت کم بودن این فضا میزان ورودی و خروجی و در نتیجه مصرف پردازنده افزایش یافته و باعث کاهش کارایی SQL Server خواهد شد.

 

-حجم داده های ذخیره شده: شامل مواردی از قبیل Data Files, Tempdb, Log Files می باشد.

 

-بررسی اجرا بودن سرویس SQL : در صورت اجرا نبودن سرویس، پس از گذشت مدت زمانی خاص ، زبیکس این قابلیت را دارد تا سرویس را اجرا یا ریستارت کند.

 

 

 برخی از آیتم های مانیتور شده برای SQL Server

 

 Description

 Name

(Unused physical memory (not page file 

Available Mbytes 

The average wait time, in milliseconds, for each lock request that had to wait

 An average wait time longer than 500ms may indicate excessive blocking. This value should generally correlate to “Lock Waits/sec” and move up or down with it accordingly

Average Wait Time 

The average latch wait time, in millisec- onds, for any latch requests that had to wait. This value should generally correlate to “Latch Waits/sec” and move up or down with it accordingly

Avg Latch Waits Time 

Number of batch requests received per second, and is a good general indicator for the activity level of the SQL Server. This counter is highly dependent on the hardware and quality of code running on the server. The more powerful the hardware, the higher this number can be, even on poorly coded applications. A value of 1000 batch requests/sec is easily attainable though a typical 100Mbs NIC can only handle about 3000 batch requests/sec.Many other counter thresh- olds depend upon batch requests/sec while, in some cases, a low (or high) number does not point to poor processing power. You should frequently use this counter in combination with other counters, such as processor utilization or user connections.In version 2000, “Transactions/

overall activity, while versions 2005 and later use “Batch Requests/sec”. Versions 2005 prior to SP2, measure this counter differently and may lead to some misunderstandings. Read the footnote for more details

 

Batch Requests 

Long a stalwart counter used by SQL Server DBAs, this counter is no longer very useful. It monitors the percentage of data requests answer from the buffer cache since the last reboot. However, other counters are much better for showing current memory pressure that this one because it blows the curve. For example, PLE (page life expectancy) might suddenly drop from 2000 to 70, while buffer cache hit ration moves only from 98.2 to

98.1. Only be concerned by this counter if it’s value is regularly below 90 (for OLTP) or 80 (for very large OLAP)

Buffer cache hit ratio 

Number of database pages in the buffer pool, as opposed to other usages for memory such as free pages, procedure cache, etc

Database Pages 

Cumulative size (KB) of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of tempdb

Data File(s) Size 

The number of latches in the last second that had to wait. Latches are lightweight means of holding a very transient server resource, such as an address in memory. 

Latch Waits 

The number of new locks and locks converted per second. This metric’s value should generally correspond to “Batch Re- quests/sec”. Values > 1000 may indicate queries are accessing very large numbers of rows and may benefit from tuning

Lock Requests 

Shows the number of lock requests per second that timed out, including internal requests for NOWAIT locks. A value

greater than zero might indicate that user queries are not completing. The lower

this value is, the better

Lock Timeouts 

The total time spent waiting across all transactions, in milliseconds, to acquire a lock in the last second. Because SQL

Server records a lock at the end of a lock- ing event, remember that an application with huge transactions may have inflated lock wait times while still performing

as expected. For example, an applica- tion that issues multi-million record updates might have very long lock wait times while performing exactly as it was designed

Lock Waits 

The number of user logins per second. Any value over 2 may indicate insufficient connection pooling

Logins 

The number of user logouts per second. Any value over 2 may indicate insufficient connection pooling

Logouts 

Cumulative size, in (KB), of all the transaction log files for the specific database. Useful for deter- mining trends and utilization of the transaction log

Log File(s) Size

The cumulative used size of all the log files in the database

Log File(s) Used Size

Tells, on average, how many seconds SQL Server expects a data page to stay in cache. The target on an OLTP system should be at least 300 (5 min). When under 300, this may indicate poor index design (leading to increased disk I/O

and less effective use of memory) or, simply, a potential shortage of memory

Page life expectancy

Number of physical database page reads issued. Normal OLTP workloads support 80 – 90 per second, but higher values may be a yellow flag for poor indexing or insufficient memory

Page Reads

< 20 per 100 Batch Requests/Sec

Monitors the number of page splits per second which occur due to overflowing index pages and should be as low as possible. To avoid page splits, review table and index design to reduce non-sequential inserts or implement fillfactor and pad_index to leave more empty space per page. NOTE: A high value for this counter is not bad in situations where many new pages are being created, since it includes new page allocations

Page Splits

Number of database pages physically written to disk. Normal OLTP workloads support 80 – 90 per second. Values over 90 should be crossed checked with “lazy writer/sec” and “checkpoint” counters. If the other counters are also high, then it may indicate insufficient memory 

Page Writes

(Number of times that Transact-SQL compilations occurred, per second (including recompiles

The lower this value is the better. High values often indicate excessive adhoc querying and should be as low as possible. If excessive adhoc querying is happening, try rewriting the queries as procedures or invoke the queries using sp_ex- ecuteSQL. When rewriting isn’t possible, consider using a plan guide or setting the database to parameterization forced mode

SQL Compilations

Number of times, per second, that Transact-SQL

objects attempted to be executed but had to

be recompiled before completion. This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter’s value should follow in proportion to “Batch Requests/sec” and “SQL Compilations/ sec”. This needs to be nil in your system as much as possible

SQL Re-Compilations

Percentage of elapsed time that the process threads spent executing code in privileged mode, like SQL Server I/O requests

SQL Server %Privileged Time

Percentage of processor time spent on SQL Server process threads.You may also wish

to investigate other Process (sqlservr) such as Private Bytes, Virtual Bytes, Working Set, etc to get a fuller understanding of how SQL Server allocates certain segments of memory. Usually, these auxiliary counters

provide contextual information and are not necessary for troubleshooting

SQL Server %Processor Time

Percentage of processor time spent on

SSAS process threads

SSAS %Processor Time

Number of errors per second which takes a data- base offline or kills a user connection, respec- tively. Since these are severe errors, they should occur very infrequently

SQL Server Errors

Tells how many pages were “stolen” from the buffer pool to satisfy other memory needs, such as plan cache and workspace memory

This number is a good metric to determine how much data is flowing into SQL Server caches

and should remain proportionate to “Batch Requests/sec”. Also remember to look for where these stolen pages might be stolen from – optimizer memory, lock memory, and so forth

Stolen Pages

Shows the amount of memory that SQL Server wants to use based on the configured Max Server Memory

Target SQL Server Memory

The total latch wait time in milliseconds spent waiting for a latch in the last second. This value should stay stable compared to the number of latch waits per second

Total Latch Wait Time

Shows the amount of memory that SQL Server is currently using. This value should grow until its equal to Target Server Memory, as it popu- lates its caches and loads pages into memory. When it has finished, SQL Server is said to be in a “steady-state”. Until it is in steady-state, per- formance may be slow and IO may be higher

Total SQL Server Memory

The number of users currently connected to the SQL Server. This counter should roughly track with “Batch Requests/Sec”. They should generally rise and fall togeth- er. For example, blocking problems could be revealed by rising user connections, lock waits and lock wait time coupled

with declining batch requests/sec

User Connections