This article aims at explaining different ways ggRock Stats (Grafana) can be utilized to monitor and troubleshoot existing issues.
Network Cards (NICs)
Stats setup
You will always see at least 2 NICs, one called lo (loopback, ignore it, like localhost). Most servers have network bridge set up (for VMs), so in general you should see and select vmbr0.
If vmbr0 is missing, bridge was not configured and you can only select one by one until you see some real heavy traffic on the chart. You can have multiple NICs, but in general only one of them is used unless you have NIC teaming/bonding set up.
Transmit speed
On the Network Traffic chart it is worth to check that peak Transmit speed for last day/week is not limited by a poor cable/connection (i.e. if the server NIC is 1GB/s we expect peak transmit speed to be up to 120 MB/s, and much higher than 10MB/s, for 10GB/s we expect peak transmit speed to be up to 1200 MB/s, and much higher than 100MB/s).
Receive Speed
On the Network Traffic chart it is also worth to check Receive Speeds. It's not common for that value to be higher than Transmit, because Machines read more often than they write. High Receive Speeds means that Machines are trying to save a lot of data, which usually generated high I/O delay, high disk I/O Writes and high disk IOPS Writes. This can be caused by multiple things, Defragmentation being enabled being one of them (should be disabled)
CPU Usage
On CPU usage chart the most important value is I/O delay which, in ideal case, is expected to be below 3%. It is relatively fine to have it under 10%. If you are observing values in excess of 10% then there is either too much Write traffic going on or drives are too slow. CPU usage itself, naturally, should not be 100%. in general, without VMs running, it would be around 20-30% on full load, but it can vary depending on hardware used and the number of Machines connected.
RAM Cache
RAM Cache usage for a server that has been rebooted recently starts from 0 and goes up as the server is running and Machines are operating, resulting in near 100% usage.
RAM Cache hits
Ram Cache hits are usually above 60%-70% and, depending on Machines activity, can go over 90%. Optimal target for 7-day average is 80%, values below 60% mean that your disks are hit 40% of the time for data, bypassing RAM Cache. It would be best to consider increasing amount of RAM to reach ~80% RAM Cache hits
Disk IOPS and Disk I/O
Disk IOPS and Disk I/O represent drives traffic (meaning how much info is written/read to/from the physical drives). Reviewing these numbers usually only informative in conjunction with Network Traffic and I/O delay figures. As an example, when IO delay is high, one would observe increased disk Writes and could start looking at which Write speeds drives become a bottleneck.
Memory usage
Memory usage is often close to Usage = Maximum - Server Reserved. We always expect a little bit of RAM be free for ggRock server to operate (that is the purpose of the Server reserved setting in the ggRock UI)