In this post I am going to explore adding a new ASM diskgroup to an Oracle 19c RAC using PowerMax storage.
We will follow best practices as laid out in H17390 Deployment Best Practices Guide for Oracle with Powermax. This post is a companion to the video Create Oracle ASM diskgroups with Dell PowerMax and PowerPath.
This blog post assumes some basic knowledge of PowerMax storage concepts. If you need a basic introduction please check out my PowerMax basics video.
These examples will use the Solutions Enabler command line method of managing the PowerMax, unlike the video which uses the Unisphere graphical interface.
The H17390 document referenced above makes several specific recommendations with respect to the provisioning of storage for an Oracle database when using a Dell PowerMax storage array:
- Keep ASM DG in separate PMAX SGs
- Separate +DATA, +REDO and +FRA Oracle ASM disk-groups
- ASM FD with thin provisioning allows for storage reclamation
- Stripe +REDO with ASM fine grain striping
- At least five REDO log groups, 8GB member size
- External Redundancy (except +GRID)
So let’s go through these one by one:
Keep ASM Diskgroups in separate PowerMax Storage Groups.
The PowerMax provides many powerful tools to ensure optimum performance and efficiency for your ASM diskgroups including the ability to specify performance characteristics, encrypt data, compress data, replicate entire diskgroups and clone entire diskgroups using storage snapshot technology
Not all ASM diskgroups will need all of these features. For example it is unlikely the DBA would want to replicate the TEMP diskgroup using the PowerMax’s SRDF replication facility. The main +DATA and +FRA diskgroups will probably not need the same performance
PowerMax functionality could be specified on a disk-by-disk basis but that would quickly become cumbersome, so instead the optimal solution is for each ASM diskgroup to have all of the member disks be grouped into separate PowerMax Storage Groups (SGs).
This way PowerMax features can be applied to specific ASM diskgroup by virtue of applying them to the underlying Storage Group.
We will explore this concept further in the next section.
Separate +DATA, +REDO and +FRA Oracle ASM diskgroups.
Most databases that use ASM have a distinct +DATA and +FRA diskgroup as a minimum. We also recommend separating redo logs into their own dedicated +REDO ASM diskgroup, especially for high performance production databases.
Keeping redologs on their own dedicated diskgroup, and correspondingly their own SG, will allow the DBA to select the optimal performance for the database, and also allow for a more granular restore of a storage snapshot of the database.
So let’s start by using Solutions Enabler to create a new PowerMax Storage Group for the +REDO ASM diskgroup. In this example we are naming our new Storage Group swingredo_sg as this storage is intended for a Swingbench database.
symsg create swingredo_sg -sl Diamond -srp SRP_1 -nocompression
In the above example we have created a new PowerMax SG called swingredo_sg. We have specified a service level of Diamond for the best performance, and that the storage will be taken from Storage Resource Pool SRP_1.
Service Levels have replaced RAID specifications that some DBAs may recall from the past. It is not longer necessary to specify a RAID level for our database volumes. Instead the Service Level will ensure our disks get the performance necessary to support the database.
However we have specified that these volumes will NOT have storage compression applied to them. The PowerMax provides excellent storage compression which typically yields over 3:1 with most Oracle datasets, assuming that the DBA has not used database level compression or encryption. However storage compression will add a slight amount of write latency which we do not want on performance sensitive redo logs. Also the capacity consumed by redo logs is a fraction of the overall storage of the database, so the cost of not using compression on the redo logs is negligible.
Now that we have established a SG for our redo logs, we can create the PowerMax volumes. Recommendations vary, but most agree that an ASM diskgroup should have more than just one disk. The standard recommendation from Dell is to use a minimum of four ASM disks in a diskgroup but your situation may dictate more or less. In this example we are going to create three volumes, and name then SWING_REDO0, SWING_REDO1 and SWING_REDO2. Our new volumes will be 50GB in size:
symdev create -v -tdev -device_name SWING_REDO -number 0 -starting_dev_num 0 -cap 50 -captype gb -N 3
STARTING a TDEV Create Device operation on Symm 123456789304.
Wait for device create...............................Started.
Wait for device create...............................Done.
The TDEV Create Device operation SUCCESSFULLY COMPLETED: 3 devices created.
3 TDEVs create requested in request 1 and devices created are 3[ 0013B:0013D ]
STARTING Set/Reset device attribute operation on Symm 123456789304.
The Device Attribute Set/Reset operations SUCCESSFULLY COMPLETED: Attributes of 1 device(s) modified.
Successful Set/Reset device(s) in req# 1: [0013B]
STARTING Set/Reset device attribute operation on Symm 123456789304.
The Device Attribute Set/Reset operations SUCCESSFULLY COMPLETED: Attributes of 1 device(s) modified.
Successful Set/Reset device(s) in req# 1: [0013C]
STARTING Set/Reset device attribute operation on Symm 123456789304.
The Device Attribute Set/Reset operations SUCCESSFULLY COMPLETED: Attributes of 1 device(s) modified.
Successful Set/Reset device(s) in req# 1: [0013D]
Make a note of the device IDs that were created: 0013B, 0013C and 0013D. We are going to need these to add our new volumes to the SG we created:
symsg -sg swingredo_sg addall -devs 0013B:0013D
Next we need to repeat these steps for the +DATA and +FRA diskgroups, plus any other diskgroups you might need for your database. By the end of this process you will probably have a minimum of three SGs. In our example these are swingredo_sg, swingdata_sg and swingfra_sg.
Having one SG per ASM diskgroup provides a highly granular level of control. But it might also become tedious to manage so many groups. Hence we can group SGs together. An SG can include not only volumes, but also other SGs in a parent-child relationship. In this example we are going to group swingredo_sg and swingdata_sg into a parent SG called swing_sg.
symsg -sg swing_sg add sg swingdata_sg,swingredo_sg
Now any actions, such as masking, replication or snapshots against swing_sg will affect swingredo_sg and swingdata_sg, making such operations simpler to manage.
At this point we have volumes organized into SGs, and child SGs grouped together in parent SGs. Now we need to expose our volumes to the database hosts where Oracle is running.
For our database hosts to talk to our volumes, IO traffic would typically use a protocol like Fibre Channel that connects storage to servers. A detailed discussion of Fibre Channel and how it compares to alternatives is beyond the scope of this post, and frankly beyond my scope as a technologist. But what we need to do is to expose our SGs to Initiators and Ports.
Initiators are basically IO ports on HBA cards installed on the database servers. Ports on the other hand are IO ports on an IO Module installed on a Node inside of the PowerMax. Typically there would also be a SAN switch between the two but again, that is beyond the scope of this document.
Indeed it is unlikely a DBA would need to create Port Groups (PG) and Initiator Groups (IG) for database storage as that function is typically handled by a storage engineer. But in some cases the DBA and the storage engineer might be one in the same, and even if the DBA team does not get involved in this step, the process is shown below for education and enlightenment.
In the following example we create a Port Group for our PowerMax called 188_pg. We add in 16 IO ports we want to include in this PG, which could be all of the available ports, or a subset, depending on how we want to segregate and prioritize IO traffic, and how physical ports are connected to the wider SAN fabric.
symaccess -type port -name 188_pg create
symaccess -type port -name 188_pg add -dirport 1D:4,1D:5,1D:6,1D:7
symaccess -type port -name 188_pg add -dirport 2D:4,2D:5,2D:6,2D:7
symaccess -type port -name 188_pg add -dirport 1D:8,1D:9,1D:10,1D:11
symaccess -type port -name 188_pg add -dirport 2D:8,2D:9,2D:10,2D:11
Next we’ll create two Initiator Groups called dsib0144_ig and dsib0146_ig. The first IG will contain a list of initiators which are the IO ports on the HBA cards in the first database server. The second IG contains a list of the IO ports of the HBA cards in the second database server. This database is a 2-node Oracle RAC.
symaccess -type initiator -name dsib0144_ig create
symaccess -type initiator -name dsib0144_ig add -wwn 10000090faa910b2
symaccess -type initiator -name dsib0144_ig add -wwn 10000090faa910b3
symaccess -type initiator -name dsib0144_ig add -wwn 10000090faa90f86
symaccess -type initiator -name dsib0144_ig add -wwn 10000090faa90f87
symaccess -type initiator -name dsib0146_ig create
symaccess -type initiator -name dsib0146_ig add -wwn 10000090faa910aa
symaccess -type initiator -name dsib0146_ig add -wwn 10000090faa910ab
symaccess -type initiator -name dsib0146_ig add -wwn 10000090faa910ae
symaccess -type initiator -name dsib0146_ig add -wwn 10000090faa910af
We now have two IGs, but since this is one RAC database we will likely want to treat them logically as one for most purposes, so we’ll create a parent IG and add our two node-specific IGs to it.
symaccess -type initiator -name db_ig create # Parent IG for RAC
symaccess -type initiator -name db_ig add -ig dsib0144_ig
symaccess -type initiator -name db_ig add -ig dsib0146_ig
At this point we have volumes organized into Storage Groups (SGs). We have a Port Group (PG) with the PowerMax IO ports we will use for our database IO, and we have an Initiator Group (IG) with the HBA ports of the database servers. What we need to do next is to expose the volumes in our Storage Groups, to the Initiators of our Initiator Group via the ports of our Port Group. This is were use a Masking View:
symaccess create view -name swingredo_mv -pg 188_pg -ig db_ig -sg swing_sg
The Masking View swingredo_mv allows initiators in IG db_ig to access volumes in SG swing_sg using ports in PG 188_pg. Once this command completes, a SCSI bus rescan on the database hosts should reveal the existence of our new volumes. This approach also allows us to remove SGs from a masking view to remove volumes from a host, perhaps for maintenance or to force an outage.
SG3 utils. is a popular package to manage the SCSI bus on a Linux server. The following command will do an intrusive rescan of the bus, adding new devices it finds and removing dead devices. If you omit the switches it will perform a less intrusive but faster rescan. However dead devices will not be removed.
/usr/bin/rescan-scsi-bus.sh -a -r
Checking the /dev/sd* device list on the database hosts should now show new devices. However if you already have a large number of devices, or you are adding a large number, determining which devices are which can be challenging. The Solutions Enabler package provides some helpful tools. In this example we are querying the devices of the swingdata_sg SG:
[root@dsib0251 ~]# symdev list -sg swingdata_sg
Symmetrix ID: 123456789304
Device Name Dir Device
---------------------------- ------- -------------------------------------
Cap
Sym Physical SA :P Config Attribute Sts (GB)
---------------------------- ------- -------------------------------------
0014B Not Visible ***:*** TDEV N/Grp'd RW 512.0
0014C Not Visible ***:*** TDEV Grp'd RW 512.0
0014D Not Visible ***:*** TDEV Grp'd RW 512.0
0014E Not Visible ***:*** TDEV Grp'd RW 512.0
We see four devices at 512GB each, with Sym IDs of 014B, 014C, 014D and 014E. If these device IDs and capacities match what you created earlier, we have found the right devices. Next we can pull up the WWN IDs of our new volumes:
[root@dsib0251 ~]# symdev list -sg swingdata_sg -wwn
Symmetrix ID: 123456789304
Device Name Device
---------------------------- --------------------------------------------------
Sym Physical Config Attr WWN
---------------------------- --------------------------------------------------
0014B Not Visible TDEV 60000970123456789304533030313442
0014C Not Visible TDEV 60000970123456789304533030313443
0014D Not Visible TDEV 60000970123456789304533030313444
0014E Not Visible TDEV 60000970123456789304533030313445
In our example we are using PowerPath for multipathing and device management. PowerPath will aggregate all of the available paths to a given volume, balance traffic across those paths, and also optionally ensure that device names are consistent across clusters. Now that we have the WWN IDs of our volumes, we can look to see how PowerPath has managed them:
[root@dsib0251 ~]# for node in $(echo 3442 3443 3444 3445); do powermt display dev=all | grep -b3 -a6 $node; done
44434-
44435-Pseudo name=emcpowerbp
44458-Symmetrix ID=123456789304
44484-Logical device ID=0000014B
44511:Device WWN=60000970123456789304533030313442
44555-state=alive; policy=SymmOpt; queued-IOs=0; protocol=SCSI; size=512.00G
44626-==============================================================================
44705---------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
44784-### HW Path I/O Paths Interf. Mode State Q-IOs Errors
44863-==============================================================================
44942- 15 lpfc sdei OR 2c:00 active alive 0 0
48833- 15 lpfc sddc OR 2c:00 active alive 0 0
48912- 15 lpfc sdag OR 1c:00 active alive 0 0
48991-
48992-Pseudo name=emcpowercb
49015-Symmetrix ID=123456789304
49041-Logical device ID=0000014C
49068:Device WWN=60000970123456789304533030313443
49112-state=alive; policy=SymmOpt; queued-IOs=0; protocol=SCSI; size=512.00G
49183-==============================================================================
49262---------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
49341-### HW Path I/O Paths Interf. Mode State Q-IOs Errors
49420-==============================================================================
49499- 18 lpfc sdvl OR 2c:06 active alive 0 0
49973- 17 lpfc sdpt OR 1c:04 active alive 0 0
50052- 15 lpfc sdbn OR 1c:00 active alive 0 0
50131-
50132-Pseudo name=emcpowercc
50155-Symmetrix ID=123456789304
50181-Logical device ID=0000014D
50208:Device WWN=60000970123456789304533030313444
50252-state=alive; policy=SymmOpt; queued-IOs=0; protocol=SCSI; size=512.00G
50323-==============================================================================
50402---------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
50481-### HW Path I/O Paths Interf. Mode State Q-IOs Errors
50560-==============================================================================
50639- 17 lpfc sdpu OR 1c:04 active alive 0 0
51113- 16 lpfc sdhg OR 1c:02 active alive 0 0
51192- 15 lpfc sdbo OR 1c:00 active alive 0 0
51271-
51272-Pseudo name=emcpowercd
51295-Symmetrix ID=123456789304
51321-Logical device ID=0000014E
51348:Device WWN=60000970123456789304533030313445
51392-state=alive; policy=SymmOpt; queued-IOs=0; protocol=SCSI; size=512.00G
51463-==============================================================================
51542---------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
51621-### HW Path I/O Paths Interf. Mode State Q-IOs Errors
51700-==============================================================================
51779- 18 lpfc sdsr OR 1c:06 active alive 0 0
So our four new +SWING_DATA volumes are PowerPath devices; emcpowerbp, emcpowercb, emcpowercc and emcpowercd. If devices do not show up, it may be necessary to run powermt check dev=all and then rescan the scsi bus again. Repeat the SCSI bus rescan on all nodes of the RAC cluster to ensure all nodes can see the new volumes.
ASM FD with thin provisioning allows for storage reclamation
When the volumes are found on the other nodes of the RAC cluster it is quite likely that will have different PowerPath aliases to the first node. If you wish to maintain consistent naming across the cluster you can use the following PowerPath command on the first node to export the device names:
emcpadm export_mappings -f /tmp/emcp_mappings.xml
Then copy the mappings file you just created to the other database nodes and import the file using:
emcpadm import_mappings -f /tmp/emcp_mappings.xml
powermt save
PowerPath can only assign the preferred name if it is not already in use, so if you wish to maintain device naming consistency across your RAC cluster, start the practice and maintain it diligently. However with the advent of ASM Filter Drivers from Oracle, it is not necessary to maintain device naming consistency as ASM will respect the ASM label on the device, rather than the device name.
ASM Filter Drives also have the ability to reclaim storage from thinly provisioned volumes. All volumes on the PowerMax are thin provisioned, so if the DBAs later decide to drop a tablespace or a data file, the capacity will be released back to the PowerMax and can be quickly used elsewhere.
We can label our new volumes as ASM disks using the asmcmd command:
asmcmd afd_label SWINGDATA00 /dev/emcpowerbp
asmcmd afd_label SWINGDATA01 /dev/emcpowercb
asmcmd afd_label SWINGDATA02 /dev/emcpowercc
asmcmd afd_label SWINGDATA03 /dev/emcpowercd
Now we can create our new ASM diskgroup using the volumes we have created on the PowerMax. In this example we are going to use EXTERNAL redundancy as the PowerMax will handle storage redundancy automatically and much more efficiently than ASM will do natively. We will also specify an AU_SIZE of 4MB, which is a suitable size for most workloads.
asmca -silent -createDiskGroup -diskString 'AFD:*' -diskGroupName SWINGDATA -diskList AFD:SWINGDATA00,AFD:SWINGDATA01,AFD:SWINGDATA02,AFD:SWINGDATA03 -redundancy EXTERNAL -au_size 4 -compatible.asm 19.0.0 -compatible.rdbms 19.0.0
By default our new ASM diskgroup will not use thin-provisioning, meaning that if we drop data files any space will not be reclaimed by the PowerMax. To enable thin provisioning and storage reclamation we need to set the following attribute using ASM.
alter diskgroup SWINGDATA set attribute 'thin_provisioned'='TRUE';
Stripe +REDO with ASM fine grain striping
By default ASM uses a 1MB stripe size, which is ideal for most database workloads. For high performance write intensive systems however, a smaller 128KB stripe size may offer some advantages. To have any benefit the stripe size of the redo logs should be set in the ASM diskgroup before any redo logs are created. Alternatively the DBA can recreate redologs after the change is implemented:
SQL> alter diskgroup SWINGREDO alter template onlinelog attributes (fine);
At least five REDO log groups, 8GB member size
When using DBCA to create a new database, the default is to create three redo log groups, with two members each, and each log sized at 50MB. This default dates back many years and is mostly unsuitable to modern performance systems. For any non trivial workload a minimum of five redo log groups should be considered, and each member sized at 8GB or larger. This will allow the log writer and archiver to keep up with the write activity of the database.
In the following example we use a SQL script to generate a script to drop logs that do not comply to the 8GB criteria, and then add in new ones that do. The script should allow for a non-disruptive replacement of existing redologs if the database is in archivelog mode.
SQL> @switch_all_8G.sql
--command
-------------------------------
alter database drop logfile group 1;
alter database add logfile thread 1 group 1 size 8G blocksize 512;
alter database drop logfile group 4;
alter database add logfile thread 2 group 4 size 8G blocksize 512;
alter system switch logfile;
alter system archive log current;
alter system checkpoint;
alter database drop logfile group 2;
alter database add logfile thread 1 group 2 size 8G blocksize 512;
alter system switch logfile;
alter system archive log current;
alter system checkpoint;
alter database drop logfile group 3;
alter database add logfile thread 2 group 3 size 8G blocksize 512;
Existing redolog resize script.
The above script will replace existing non-confirming logs. The following will add additional redo log groups to each thread until they meet a minimum of five groups.
SQL> @add_missing_logs.sql
Add missing redolog groups script.
Lastly we should verify that we now have the expected 5 members per group of the expected 8GB size.
GROUP# THREAD# SIZE_MB BS MEMBERS TYPE MEMBER
------ ------- ---------- ---- ---------- ------- ------------------------------------------------------------
6 1 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_6.267.1056747647
7 1 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_7.268.1056747667
8 1 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_8.266.1056747557
9 1 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_9.269.1056747689
10 1 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_10.270.1056747705
6 1 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_6.267.1056747647
7 1 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_7.268.1056747667
8 1 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_8.266.1056747557
9 1 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_9.269.1056747689
10 1 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_10.270.1056747705
1 2 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_1.262.1056747642
2 2 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_2.263.1056747643
3 2 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_3.264.1056747644
4 2 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_4.265.1056747645
5 2 2048 512 2 ONLINE +SWING_REDO/SWING19C/ONLINELOG/group_5.266.1056747646
1 2 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_1.262.1056747642
2 2 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_2.263.1056747643
3 2 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_3.264.1056747644
4 2 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_4.265.1056747645
5 2 2048 512 2 ONLINE +SWING_FRA/SWING19C/ONLINELOG/group_5.266.1056747646
In upcoming blog posts I will show how to make snapshot clones of our database using ASM on PowerMax, and also how to replicate a database using SRDF – the powerful PowerMax storage replication technology.
Acknowledgements.
Sincere thanks to Yaron Dar at Dell for his generous help and allowing me time on his PowerMax lab system.
