Since the move to all flash storage solutions, about ten years ago, CPU utilization and bottlenecks have frequently become the primary cause of performance concerns for Oracle DBAs. Whereas in the past the top wait events seen in a Statspack or AWR report might be “db file scattered read” or “db file sequential read”, our contemporary DBA is far more likely to see “CPU” atop the list.
It is important to remember that seeing “CPU” atop the wait events in an AWR report might indicate that the host CPUs are saturated, or it might indicate that there are single threaded processes within the database that are waiting on a single core to complete a compute operation.
And there is a trade off between more cores and clock speed. Generally speaking, within each generation of CPU, more cores means a slower clock. A denser processor with more cores will be able to handle a larger number of simultaneous compute operations – ideal of parallel query processing, analytics, AI and consolidation, whereas a lower core count with a higher clock speed is generally better suited for transaction processing applications, as well as being dramatically less expensive to license for the Oracle software.
Keeping an eye on host utilization then is an important part of an operational DBAs task list, and there are many excellent tools to do this across a large enterprise. In this post we explore a simple method to monitor host utilization from within the database, meaning that our DBA can use SQL*Plus access via TNS to databases to keep track of the underlying hosts.
Oracle provides a rich set of internal views which we can mine to get a view as to what is happening on the database host. The first one is V$OSSTAT, and its RAC counterpart GV$OSSTAT, which report key metrics against many import OS data points:
SQL> select stat_name, value from v$osstat;
STAT_NAME VALUE
---------------------------------------------------------------- ----------
NUM_CPUS 72
IDLE_TIME 673858778
BUSY_TIME 83479087
USER_TIME 35776503
SYS_TIME 36340001
IOWAIT_TIME 153773190
NICE_TIME 52
RSRC_MGR_CPU_WAIT_TIME 0
LOAD .229492188
NUM_CPU_CORES 72
NUM_CPU_SOCKETS 4
PHYSICAL_MEMORY_BYTES 6.0686E+11
VM_IN_BYTES 0
VM_OUT_BYTES 0
FREE_MEMORY_BYTES 5.5386E+11
INACTIVE_MEMORY_BYTES 4936794112
SWAP_FREE_BYTES 1.7180E+10
TCP_SEND_SIZE_MIN 4096
TCP_SEND_SIZE_DEFAULT 16384
TCP_SEND_SIZE_MAX 4194304
TCP_RECEIVE_SIZE_MIN 4096
TCP_RECEIVE_SIZE_DEFAULT 87380
TCP_RECEIVE_SIZE_MAX 6291456
GLOBAL_SEND_SIZE_MAX 1048576
GLOBAL_RECEIVE_SIZE_MAX 4194304
In the above example we are running Oracle 19c RAC, and on this node our host has 4 sockets with 18 core processors for a total of 72 cores. Hyper-threading is disabled on this system so we have 72 CPUs visible to Oracle. The host reports 578GB of RAM to Oracle.
The V$OSSTAT view also reports CPU utilization numbers; BUSY_TIME, IDLE_TIME, SYS_TIME, USER_TIME, IOWAIT_TIME and NICE_TIME. From these it should be possible to calculate how busy our database host is.
Some years back Craig Shallahamer published an interesting blog post Simple Way to Calculate OS CPU Utilization From Oracle Database V$ Views, in which he explored how to calculate host utilization from inside the Oracle database. In that post Craig offered up the formula:
U = R / C
Where; U is utilization, R is requirements and C is capacity.
Capacity is what the host is capable of delivering, which is a simple BUSY_TIME+IDLE_TIME calculation. Utilization is how much is being used – Load if you will – and is represented by BUSY_TIME. Dividing BUSY_TIME by BUSY_TIME+IDLE_TIME delivers Utilization:
select
busy.value/(busy.value+idle.value) "Utilization"
from
v$osstat busy,
v$osstat idle
where 1=1
and busy.stat_name = 'BUSY_TIME'
and idle.stat_name = 'IDLE_TIME'
SQL> /
Utilization
-----------
.110112201
So in our example here the host is 11% utilized.
However, if we check our Linux utilities we find that it reports that our host is 99.6% idle:
[oracle@racn1 ~]$ top
top - 17:24:40 up 22 days, 6:39, 3 users, load average: 0.20, 0.19, 0.15
Tasks: 1376 total, 2 running, 1374 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 59263667+total, 54088044+free, 27121272 used, 24634944 buff/cache
KiB Swap: 16777212 total, 16777212 free, 0 used. 54929209+avail Mem
The disconnect here is that the values held in V$OSSTAT are cumulative. The view reports host CPU utilization numbers since the instance started. To be fair to Craig, his blog post was concerned with total CPU utilization rather than an instant point reading. An operational DBA is interested in the macro view of course, but is more immediately interested in the current utilization rate.
In order to calculate the current host utilization, we need to establish an epoch, take a measurement, wait a short period of time, and then measure again. Our host utilization formula will then be:
U = (R2-R1) / (C2-C1)
Back in 2009, Christian Antognini provided a clever SQL script to do exactly this in his excellent blog post Report Information about CPU Activity in SQL*Plus.
Let’s see what Christian’s script reports on our 19c database on RAC node 1:
SQL> select user_time, nice_time, sys_time, iowait_time, idle_time from table(osstat(5,5));
USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME
--------- --------- -------- ----------- ---------
0.17 0.00 0.15 0.00 99.68
0.14 0.00 0.14 0.00 99.72
0.18 0.00 0.17 0.00 99.64
0.17 0.00 0.20 0.00 99.63
And here is the output of dstat for the same period:
[oracle@racn1 ~]$ dstat -d -n -c
-dsk/total- -net/total- ----total-cpu-usage----
read writ| recv send|usr sys idl wai hiq siq
243k 66k| 178k 1720B| 0 0 99 0 0 0
3072B 66k| 13k 13k| 0 0 100 0 0 0
339k 18k| 47k 29k| 0 0 100 0 0 0
291k 66k|1249B 927B| 0 0 100 0 0 0
51k 403k| 58k 39k| 0 0 100 0 0 0
51k 18k| 186k 74k| 0 0 100 0 0 0
Both SQL and dstat reported their numbers match almost exactly. But these are numbers while the database is idle. This database is a Swingbench database, so let’s launch 100 client connections spread across the two nodes of our RAC and see what happens to load:
SQL> select user_time, nice_time, sys_time, iowait_time, idle_time from table(osstat(5,20));
USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME
--------- --------- -------- ----------- ---------
4.02 0.00 5.02 19.68 71.28
3.86 0.00 4.82 19.70 71.62
3.90 0.00 4.83 19.66 71.61
3.84 0.00 4.82 19.69 71.65
4.09 0.00 5.18 19.75 70.99
4.96 0.00 4.82 19.61 70.61
4.35 0.00 4.71 19.48 71.46
5.09 0.00 4.94 19.65 70.32
4.74 0.00 4.95 19.87 70.45
5.03 0.00 4.77 19.92 70.28
4.86 0.00 4.86 19.33 70.96
4.59 0.00 5.01 16.92 73.48
4.45 0.00 4.98 17.91 72.66
4.06 0.00 4.77 19.32 71.86
3.93 0.00 4.65 19.20 72.22
4.16 0.00 4.76 18.74 72.34
4.12 0.00 4.60 18.56 72.72
4.03 0.00 4.65 19.05 72.27
4.02 0.00 4.78 18.91 72.29
20 rows selected.
And here is the output of dstat for the same period:
-dsk/total- -net/total- ----total-cpu-usage----
read writ| recv send|usr sys idl wai hiq siq
4524M 19M|1740k 1651k| 5 6 63 24 0 2
4480M 2536k|1511k 1724k| 5 5 64 24 0 2
4547M 711k| 983k 1111k| 5 6 62 24 0 2
4458M 7138k|1384k 1458k| 4 5 66 23 0 2
4552M 175k|1112k 975k| 5 5 65 23 0 2
4551M 135k|2825k 2250k| 5 6 65 23 0 2
4515M 6088k| 983k 734k| 5 6 63 24 0 2
4522M 1216k| 577k 510k| 5 6 64 23 0 2
4421M 235k|1198k 942k| 5 6 64 24 0 2
4318M 6622k| 615k 518k| 5 6 65 24 0 1
4421M 1564k| 839k 475k| 6 6 64 23 0 2
4482M 433k| 713k 474k| 5 6 66 22 0 1
4486M 12M| 552k 698k| 5 6 66 22 0 1
4585M 2032k| 604k 842k| 5 6 65 23 0 2
4548M 2064k|1029k 1118k| 5 6 66 22 0 2
4548M 14M|1454k 1202k| 5 6 65 22 0 2
4534M 3714k|1105k 1197k| 5 6 65 23 0 2
4449M 185k|2558k 2473k| 4 5 66 23 0 2
4423M 13M| 906k 1031k| 5 5 64 24 0 2
4481M 1721k| 994k 928k| 5 6 65 23 0 2
4525M 1788k|1271k 1020k| 5 6 64 24 0 2
4527M 6847k| 895k 781k| 5 5 65 23 0 2
4481M 1151k|1049k 908k| 5 6 65 22 0 2
4495M 351k|1125k 966k| 5 6 65 22 0 2
4471M 10M|1025k 923k| 5 6 65 23 0 2
4474M 1665k| 935k 702k| 5 5 65 23 0 2
4454M 265k| 773k 629k| 5 6 64 23 0 2
4461M 12M| 979k 871k| 5 6 64 23 0 2
4456M 1920k| 932k 984k| 5 6 64 24 0 2
4457M 285k|2485k 2197k| 5 6 65 23 0 1
4515M 12M| 955k 784k| 5 6 64 23 0 2
4484M 2039k| 963k 1070k| 5 6 64 24 0 1
4631M 187k| 946k 1059k| 5 6 65 23 0 2
4598M 5184k| 868k 953k| 5 6 65 23 0 2
4478M 1347k| 773k 865k| 5 6 64 24 0 2
4521M 196k| 943k 828k| 5 6 63 24 0 2
Interesting. The SQL script is reporting that the host is at least 70% idle, but dstat reports a lower value at around 65%. Not a huge discrepancy, but what happens if we shut down Node 2 of our RAC and force all the connections to fail over to Node 1:
SQL> select user_time, nice_time, sys_time, iowait_time, idle_time from table(osstat(5,20));
USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME
--------- --------- -------- ----------- ---------
9.17 0.00 17.51 28.41 44.91
9.08 0.00 17.91 28.38 44.63
9.07 0.00 17.59 28.37 44.97
9.24 0.00 17.57 28.72 44.47
8.99 0.00 17.48 28.68 44.84
9.19 0.00 17.45 28.28 45.07
9.14 0.00 17.04 28.54 45.27
8.74 0.00 16.67 29.13 45.46
9.05 0.00 17.27 28.66 45.03
8.93 0.00 17.18 28.97 44.93
9.22 0.00 17.23 28.58 44.97
9.41 0.00 17.28 28.04 45.27
9.15 0.00 17.24 28.62 44.99
10.03 0.00 16.68 27.87 45.42
9.30 0.00 16.82 28.56 45.32
10.17 0.00 17.08 27.54 45.21
9.29 0.00 16.90 28.40 45.41
9.09 0.00 16.56 28.78 45.57
9.21 0.00 16.61 28.66 45.51
The SQL script is reporting a somewhat healthy ~45% idle on the CPU. Plenty of head room you might conclude. However, dtstat reports an entirely different situation with just over 20% idle on the CPUs. Not a crisis situation yet, but were a DBA to add yet more load thinking there is plenty of head room, the performance may be severely impacted:
-dsk/total- -net/total- ----total-cpu-usage----
read writ| recv send|usr sys idl wai hiq siq
9037M 1008k| 189k 9094B| 12 23 20 36 0 9
8935M 6762k|9414B 5677B| 12 22 22 37 0 8
8939M 958k| 63k 44k| 12 21 20 38 0 9
9002M 1057k| 36k 22k| 12 21 21 36 0 9
8934M 7191k| 51k 29k| 14 21 22 34 0 9
8946M 2923k| 415k 129k| 13 21 24 32 0 10
8875M 51M| 116k 69k| 12 21 23 35 0 9
8942M 17M| 108k 64k| 12 21 21 36 0 10
8983M 583k| 79k 50k| 12 21 21 36 0 9
8958M 5549k| 344k 334k| 12 22 21 37 0 9
8969M 30M| 182k 5087B| 12 22 20 37 0 9
8992M 156k| 29k 21k| 12 21 23 36 0 8
8917M 1279k| 110k 60k| 13 21 23 34 0 9
8899M 18M| 206k 109k| 14 21 23 34 0 8
8901M 33M| 161k 91k| 13 21 24 34 0 9
8946M 16M| 239k 33k| 12 23 20 36 0 9
8837M 27M| 61k 38k| 12 21 22 37 0 8
8874M 560k| 57k 35k| 12 21 22 36 0 8
8872M 464k| 107k 66k| 12 22 22 36 0 8
8869M 28M| 25k 14k| 12 21 23 36 0 8
8973M 369k| 216k 28k| 12 22 21 36 0 9
8981M 565k| 343k 333k| 12 22 21 37 0 9
8860M 17M| 20k 12k| 11 21 21 37 0 9
8968M 300k| 101k 66k| 12 22 20 36 0 10
The issue here is that Oracle is folding the IOWAIT_TIME into the IDLE_TIME. The Linux utilities are not.
There are lengthy blog post and forum discussions as to whether waiting on IO is an idle event or not but that argument is beyond the scope of this blog post. I encourage you to research this topic if it interests you and draw your own conclusions.
What we are trying to do here is to report the host utilization in a manner consistent with what the Linux native tools are reporting.
To achieve the Linux Idle (Ix) then we need to subtract the delta of IOWAIT_TIME ( W) from the delta of IDLE_TIME (I) and then divide the result by the delta of Capacity – which is still IDLE_TIME + BUSY_TIME:
(I2-W2)-(I1-W1)
Ix = _____________________
(C2-C1)
I made these changes to Christian Antognini’s script and uploaded a modified version to GitHub.
With the second RAC instance still shut down and our 100 Swingbench sessions all running on Node 1, let’s see how the revised SQL script reports host idle against what dstat sees. SQL script output first:
SET ARRAYSIZE 1
COLUMN user_time FORMAT 990.00
COLUMN nice_time FORMAT 990.00
COLUMN sys_time FORMAT 990.00
COLUMN iowait_time FORMAT 990.00
COLUMN idle_time FORMAT 990.00
SQL> select user_time, nice_time, sys_time, iowait_time, idle_time from table(osstat2(5,20));
USER_TIME NICE_TIME SYS_TIME IOWAIT_TIME IDLE_TIME
--------- --------- -------- ----------- ---------
12.19 0.00 21.73 37.45 20.42
11.54 0.00 21.54 37.57 20.44
11.63 0.00 21.60 37.06 20.53
11.52 0.00 21.68 37.06 21.26
12.48 0.00 21.87 37.48 19.71
12.84 0.00 21.76 36.69 19.35
12.67 0.00 21.90 37.35 19.26
13.00 0.00 22.06 36.76 19.71
12.65 0.00 21.40 37.43 19.43
12.52 0.00 21.29 37.96 19.47
12.72 0.00 21.57 38.03 19.67
12.45 0.00 21.42 38.30 19.39
11.89 0.00 21.14 37.56 20.31
11.34 0.00 21.80 37.11 21.09
12.33 0.00 22.01 38.06 19.21
11.39 0.00 21.23 37.61 20.72
12.91 0.00 21.50 36.87 19.85
13.32 0.00 21.98 36.66 19.77
13.01 0.00 21.29 37.47 19.50
20 rows selected.
And now the output of dstat for the same period:
-dsk/total- -net/total- ----total-cpu-usage----
read writ| recv send|usr sys idl wai hiq siq
8909M 5076k| 247k 37k| 13 21 20 37 0 9
8840M 34M| 53k 32k| 13 21 19 37 0 9
8977M 16M| 66k 38k| 13 22 19 38 0 9
8904M 1957k| 88k 54k| 13 21 20 38 0 9
8880M 10M| 56k 38k| 13 20 20 38 0 9
8957M 217k| 195k 10k| 13 22 19 38 0 9
8950M 3087k| 29k 18k| 12 21 20 38 0 9
8905M 13M| 50k 32k| 13 21 19 39 0 9
8989M 490k| 97k 56k| 13 21 19 37 0 9
8934M 2644k| 402k 367k| 14 21 19 37 0 9
8914M 15M| 239k 38k| 13 22 20 37 0 9
8949M 291k| 27k 16k| 13 23 19 38 0 8
8989M 3240k| 60k 37k| 13 22 19 37 0 9
8936M 22M| 72k 47k| 13 22 20 37 0 8
8948M 627k| 23k 18k| 13 22 19 38 0 8
8980M 2347k| 191k 12k| 13 22 20 36 0 9
Restarting the second RAC node and rebalancing the connections brings down the load on Node 1, and again the modified script reports roughly the same values for IDLE and IOWAIT that dstat and top report.
It is worth noting that both that these results might be unique to Linux and how IOWAITs are reported. I believe both Craig’s and Christian’s work was done on AIX.
Thanks to Craig Shallahamer and Christian Antognini for their excellent work.

Pingback: Monitoring Oracle Database Performance with Grafana – Part I | the gruffdba
Pingback: Performance Monitor Changes in Version 2024.2—Part III: OS Stat Page - SERREC