یکی از مهمترین سیستم های پایگاه داده که امروزه در اکثر سازمان ها مورد استفاده قرار می گیرد SQL Server می باشد. در کشور ما نیز اکثر سازمان ها و شرکت ها از SQL Server برای ذخیره سازی اطلاعات حساس و مهم خود استفاده می کنند.
لذا علاوه بر حفظ و نگهداری از آن، بازدهی و کارایی بالای آن نیز دارای اهمیت می باشد. Performance پایگاه داده و مانیتور کردن آن از مواردی است که برای مدیران IT سازمان ها مهم و تعیین کننده می باشد. اجرای صحیح Job ها، وجود ارتباط بین پایگاه داده و برنامه های کاربردی، پاسخ دهی به موقع و مناسب به درخواستها، همگی از مواردی است که بر کارکرد دیتابیس تاثیر گذار است.
سیستم جامع مانیتورینگ زبیکس با پوشش موارد فوق این امکان را فراهم می آورد تا در صورت ایجاد خلل یا کاهش بازدهی سیستم، مدیران را با خبر ساخته و علت وقوع آن را ردیابی و گزارش دهی کند.
برخی از ویژگی های مانیتورینگ پایگاه داده SQL Server :
-میزان استفاده SQL Server از CPU : سروری که برای SQL انتخاب می شود باید علاوه بر پاسخ دهی به پردازش های سیستم عامل که معمولا windows server است به سرویس SQL Server نیز پاسخ دهی مناسبی داشته باشد.
-میزان استفاده SQL Server از مموری: میزان فضای موجود برای اجرای SQL Server از موارد تاثیر گذار بر بازدهی و کارایی دیتابیس می باشد. در صورت کم بودن این فضا میزان ورودی و خروجی و در نتیجه مصرف پردازنده افزایش یافته و باعث کاهش کارایی SQL Server خواهد شد.
-حجم داده های ذخیره شده: شامل مواردی از قبیل Data Files, Tempdb, Log Files می باشد.
-بررسی اجرا بودن سرویس SQL : در صورت اجرا نبودن سرویس، پس از گذشت مدت زمانی خاص ، زبیکس این قابلیت را دارد تا سرویس را اجرا یا ریستارت کند.
برخی از آیتم های مانیتور شده برای SQL Server
Description |
Name |
(Unused physical memory (not page file |
Available Mbytes |
The average wait time, in milliseconds, for each lock request that had to wait An average wait time longer than 500ms may indicate excessive blocking. This value should generally correlate to “Lock Waits/sec” and move up or down with it accordingly |
Average Wait Time |
The average latch wait time, in millisec- onds, for any latch requests that had to wait. This value should generally correlate to “Latch Waits/sec” and move up or down with it accordingly |
Avg Latch Waits Time |
Number of batch requests received per second, and is a good general indicator for the activity level of the SQL Server. This counter is highly dependent on the hardware and quality of code running on the server. The more powerful the hardware, the higher this number can be, even on poorly coded applications. A value of 1000 batch requests/sec is easily attainable though a typical 100Mbs NIC can only handle about 3000 batch requests/sec.Many other counter thresh- olds depend upon batch requests/sec while, in some cases, a low (or high) number does not point to poor processing power. You should frequently use this counter in combination with other counters, such as processor utilization or user connections.In version 2000, “Transactions/ overall activity, while versions 2005 and later use “Batch Requests/sec”. Versions 2005 prior to SP2, measure this counter differently and may lead to some misunderstandings. Read the footnote for more details
|
Batch Requests |
Long a stalwart counter used by SQL Server DBAs, this counter is no longer very useful. It monitors the percentage of data requests answer from the buffer cache since the last reboot. However, other counters are much better for showing current memory pressure that this one because it blows the curve. For example, PLE (page life expectancy) might suddenly drop from 2000 to 70, while buffer cache hit ration moves only from 98.2 to 98.1. Only be concerned by this counter if it’s value is regularly below 90 (for OLTP) or 80 (for very large OLAP) |
Buffer cache hit ratio |
Number of database pages in the buffer pool, as opposed to other usages for memory such as free pages, procedure cache, etc |
Database Pages |
Cumulative size (KB) of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of tempdb |
Data File(s) Size |
The number of latches in the last second that had to wait. Latches are lightweight means of holding a very transient server resource, such as an address in memory. |
Latch Waits |
The number of new locks and locks converted per second. This metric’s value should generally correspond to “Batch Re- quests/sec”. Values > 1000 may indicate queries are accessing very large numbers of rows and may benefit from tuning |
Lock Requests |
Shows the number of lock requests per second that timed out, including internal requests for NOWAIT locks. A value greater than zero might indicate that user queries are not completing. The lower this value is, the better |
Lock Timeouts |
The total time spent waiting across all transactions, in milliseconds, to acquire a lock in the last second. Because SQL Server records a lock at the end of a lock- ing event, remember that an application with huge transactions may have inflated lock wait times while still performing as expected. For example, an applica- tion that issues multi-million record updates might have very long lock wait times while performing exactly as it was designed |
Lock Waits |
The number of user logins per second. Any value over 2 may indicate insufficient connection pooling |
Logins |
The number of user logouts per second. Any value over 2 may indicate insufficient connection pooling |
Logouts |
Cumulative size, in (KB), of all the transaction log files for the specific database. Useful for deter- mining trends and utilization of the transaction log |
Log File(s) Size |
The cumulative used size of all the log files in the database |
Log File(s) Used Size |
Tells, on average, how many seconds SQL Server expects a data page to stay in cache. The target on an OLTP system should be at least 300 (5 min). When under 300, this may indicate poor index design (leading to increased disk I/O and less effective use of memory) or, simply, a potential shortage of memory |
Page life expectancy |
Number of physical database page reads issued. Normal OLTP workloads support 80 – 90 per second, but higher values may be a yellow flag for poor indexing or insufficient memory |
Page Reads |
< 20 per 100 Batch Requests/Sec Monitors the number of page splits per second which occur due to overflowing index pages and should be as low as possible. To avoid page splits, review table and index design to reduce non-sequential inserts or implement fillfactor and pad_index to leave more empty space per page. NOTE: A high value for this counter is not bad in situations where many new pages are being created, since it includes new page allocations |
Page Splits |
Number of database pages physically written to disk. Normal OLTP workloads support 80 – 90 per second. Values over 90 should be crossed checked with “lazy writer/sec” and “checkpoint” counters. If the other counters are also high, then it may indicate insufficient memory |
Page Writes |
(Number of times that Transact-SQL compilations occurred, per second (including recompiles The lower this value is the better. High values often indicate excessive adhoc querying and should be as low as possible. If excessive adhoc querying is happening, try rewriting the queries as procedures or invoke the queries using sp_ex- ecuteSQL. When rewriting isn’t possible, consider using a plan guide or setting the database to parameterization forced mode |
SQL Compilations |
Number of times, per second, that Transact-SQL objects attempted to be executed but had to be recompiled before completion. This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter’s value should follow in proportion to “Batch Requests/sec” and “SQL Compilations/ sec”. This needs to be nil in your system as much as possible |
SQL Re-Compilations |
Percentage of elapsed time that the process threads spent executing code in privileged mode, like SQL Server I/O requests |
SQL Server %Privileged Time |
Percentage of processor time spent on SQL Server process threads.You may also wish to investigate other Process (sqlservr) such as Private Bytes, Virtual Bytes, Working Set, etc to get a fuller understanding of how SQL Server allocates certain segments of memory. Usually, these auxiliary counters provide contextual information and are not necessary for troubleshooting |
SQL Server %Processor Time |
Percentage of processor time spent on SSAS process threads |
SSAS %Processor Time |
Number of errors per second which takes a data- base offline or kills a user connection, respec- tively. Since these are severe errors, they should occur very infrequently |
SQL Server Errors |
Tells how many pages were “stolen” from the buffer pool to satisfy other memory needs, such as plan cache and workspace memory This number is a good metric to determine how much data is flowing into SQL Server caches and should remain proportionate to “Batch Requests/sec”. Also remember to look for where these stolen pages might be stolen from – optimizer memory, lock memory, and so forth |
Stolen Pages |
Shows the amount of memory that SQL Server wants to use based on the configured Max Server Memory |
Target SQL Server Memory |
The total latch wait time in milliseconds spent waiting for a latch in the last second. This value should stay stable compared to the number of latch waits per second |
Total Latch Wait Time |
Shows the amount of memory that SQL Server is currently using. This value should grow until its equal to Target Server Memory, as it popu- lates its caches and loads pages into memory. When it has finished, SQL Server is said to be in a “steady-state”. Until it is in steady-state, per- formance may be slow and IO may be higher |
Total SQL Server Memory |
The number of users currently connected to the SQL Server. This counter should roughly track with “Batch Requests/Sec”. They should generally rise and fall togeth- er. For example, blocking problems could be revealed by rising user connections, lock waits and lock wait time coupled with declining batch requests/sec |
User Connections |