Monitoring Oracle Database Performance with Grafana – Part II

This the second part of a blog post on using the Grafana graphing tool to create an Oracle database performance dashboard for monitoring host and database performance in real time.

In the previous blog post I created a process to read key metrics from Oracle and insert them into a PostgreSQL database. This intermediate step was necessary as the free edition of Grafana does not include the connector to read directly from Oracle.

Once the metrics were loaded into a PostgreSQL table, a second process calculated deltas for those Oracle metrics that are cumulative. The results were then read by Grafana to show host performance and also report on the number of connected Swingbench users.

In this blog post I am going to expand on that and add IO metrics to track read and write IOPs and also throughput.

There is a video for this blog

As before, all code is available from my GitHub repository.

If you have not read Part I, you can find it here

Our previous blog post read from the Oracle views GV$OSSTATS and GV$SESSION. There are a couple of different views to consult for IO operations, but based on my previous blog post, I am going to rely on querying the GV$IOSTAT_FILE view.

Like the GV$OSSTAT view, the metrics in GV$IOSTAT_FILE are cumulative, so I am going to scrape that table and insert the numbers into a table in PostgreSQL using an expanded version of the script I used in the previous blog. Once the metrics have been recorded, I will calculate the deltas. Dividing the delta by the time elapsed should yield the IO operations per second.

First then, here is a query to run against the Oracle database that will generate the PostgreSQL insert statement:

--
-- reads the gv$iostat_file view and crafts a PostgreSQL insert statement
--

select
  'insert into ORA_IOSTATS_FILE values('||
  vdb.dbid||',now(),'||
--  to_char(sysdate,'YYYYMMDDHH24MISS')||','||
  sum(iof.small_read_reqs)||','||
  sum(iof.small_write_reqs)||','||
  sum(iof.large_read_reqs)||','||
  sum(iof.large_write_reqs)||','||
  sum(iof.small_read_megabytes)||','||
  sum(iof.small_write_megabytes)||','||
  sum(iof.large_read_megabytes)||','||
  sum(iof.large_write_megabytes)||');' "-- postgres insert"
from
  v$database vdb,
  gv$iostat_file iof
where 1=1
and iof.con_id=0
group by vdb.dbid

Running this code against my two-node Oracle RAC database with a Swingbench load, I get the following result:

-- postgres insert
insert into ORA_IOSTATS_FILE values(2053389406,now(),84050833579,51957932,516603,201804,656777588,438119,284112,80844);

Note that I am summing together metrics from all active nodes of the database. For IO operations I am choosing not to break this out by instance. Since these metrics are all cumulative, I need to then calculate the deltas:

with mystats as (
select
  DBID,
  TS,
  SMALL_READ_REQS,
  SMALL_WRITE_REQS,
  LARGE_READ_REQS,
  LARGE_WRITE_REQS,
  SMALL_READ_MEGABYTES,
  SMALL_WRITE_MEGABYTES,
  LARGE_READ_MEGABYTES,
  LARGE_WRITE_MEGABYTES,
  row_number() over (partition by DBID order by TS desc) as rnk
from ORA_IOSTATS_FILE
)
insert into ORA_IOSTATS_FILE_DELTA ( 
select
  a.DBID,
  a.TS,
  cast( extract(epoch from(a.TS-b.TS)) as integer) as "ELAPSED",
  greatest(0,(a.small_read_reqs+a.large_read_reqs)-(b.small_read_reqs+b.large_read_reqs)) as "READS",
  greatest(0,(a.small_write_reqs+a.large_write_reqs)-(b.small_write_reqs+b.large_write_reqs)) as "WRITES",
  greatest(0,(a.small_read_megabytes+a.large_read_megabytes)-(b.small_read_megabytes+b.large_read_megabytes)) as "READ_MB",
  greatest(0,(a.small_write_megabytes+a.large_write_megabytes)-(b.small_write_megabytes+b.large_write_megabytes)) as "WRITE_MB"
from
  mystats a, mystats b
where 1=1
and a.dbid=b.dbid
and a.rnk=1 and b.rnk=2
)

In the above query, the WITH clause generates a intermediate query that ranks the IO operations by time. Subtracting the second rank from the first rank yields the delta. My insert statement into ORA_IOSTATS_FILE_DELTA then combines small and large IO operations, separated by reads and writes. I am also calculating the read megabytes-per-second and write megabytes-per-second.

This query, and the corresponding PostgreSQL insert statement, need to be executed every few seconds to track the IO operations from the Oracle database.  After the code has been executing for a few minutes with a load against the Oracle database, querying the ORA_IOSTATS_DELTA table in PostgreSQL yields a result similar to this:

psql=# select * from ora_iostats_file_delta order by ts desc;

dbid | ts | elapsed | reads | writes | read_mb | write_mb
------------+-------------------------------+---------+-----------+---------+---------+----------
2053389406 | 2023-11-27 20:07:40.06071-06 | 7 | 4425706 | 4022 | 34584 | 33
2053389406 | 2023-11-27 20:07:32.921961-06 | 7 | 4683311 | 3668 | 36601 | 29
2053389406 | 2023-11-27 20:07:25.740403-06 | 7 | 4827584 | 1906 | 37725 | 16
2053389406 | 2023-11-27 20:07:18.353804-06 | 8 | 4781464 | 2464 | 37365 | 19
2053389406 | 2023-11-27 20:07:10.81847-06 | 8 | 4691559 | 4629 | 36663 | 38
2053389406 | 2023-11-27 20:07:03.256896-06 | 7 | 4793083 | 5528 | 37457 | 47
2053389406 | 2023-11-27 20:06:55.778713-06 | 8 | 4745019 | 2910 | 37079 | 27
2053389406 | 2023-11-27 20:06:48.239174-06 | 7 | 4716261 | 2053 | 36854 | 15

The first line of output shows us that in the seven seconds that elapsed from the previous reading to the current one, 4.425 million read operations were executed, 4022 writes, 34.5GB of data were read and 33MB of data were written.

We can now use this table to populate a new panel in Grafana.

In Grafana I add a new panel and select Gauge as the control. I set the data source to my PostgreSQL database and the ORA_IOSTATS_FILE_DELTA table. The column selected is READS and the time column is again TS:

Now I need to edit the SQL to divide the READS by time:

SELECT
ts AS "time",
reads/elapsed
FROM ora_iostats_file_delta
WHERE
$__timeFilter(ts)
ORDER BY 1

I will also use Grafana to set the panel title to Database Reads and replace the default thresholds with a base of green, yellow at 700,000 and red at 900,000. My small PowerMax 2500 storage array is able to deliver well north of one million IOPs, but these thresholds make my dashboard more fun.

grafana_io_guage_2

A second gauge can be added to track writes.

The last part of my dashboard will be a gauge to track throughput. These numbers are also available in the ORA_IOSTATS_FILE_DELTA table.

In my example I added a new panel with a Bar Graph which I formatted as a horizontal retro RGB with a base color of green, a yellow threshold at 3000 and a red threshold of 4000. I set the max value to 5000 so that the control will display coherently at startup before any numbers have been posted.

The query to pull the data is shown below. Like the code for IOPs, the metrics need to be divided by the time elapsed:

 
SELECT
  ts AS "time",
  read_mb/elapsed "READ_MBS",
  write_mb/elapsed "WRITE_MBS"
FROM ora_iostats_file_delta
WHERE
  $__timeFilter(ts)
ORDER BY 1
 

And then new panel can be positioned on my dashboard:

grafana_io_guage_4

Leave a comment