Oracle 19c: ASM and SGA_TARGET lead to crashes2020-11-24
Few months ago I’ve found a frustrating issue while I was testing Oracle 19c. We usually set ASM to use huge pages; this will avoid ASM sga to get swapped out in case of memory pressure. To do so, we prepare the ASM instance with the following settings:
alter system reset memory_max_target scope=spfile; alter system set memory_target=0 scope=spfile; alter system set sga_max_size=1088M scope=spfile; alter system set use_large_pages='ONLY' scope=spfile;
And then reboot the instance. Chances was we want also to set the SGA_TARGET to the same value of SGA_MAX_SIZE. So, on my lab database I did the following:
alter system set sga_target=1088M scope=both;
Nothing special. That parameter is dynamic, I want to increase it to the same value of SGA_MAX_SIZE, we already use ASMM memory configuration on ASM (we just disabled memory target), so what can go wrong?
Boom! ASM instance hang few seconds and then crashed and restarted, taking down the database with it. ASM alert log shows the following:
2020-04-16T09:47:35.506902+02:00 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x694] [PC:0x1C015A7, kcbw_register_start_resize()+55] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_mman_3435.trc (incident=43257): ORA-07445: exception encountered: core dump [kcbw_register_start_resize()+55] [SIGSEGV] [ADDR:0x694] [PC:0x1C015A7] [Address not mapped to object]  Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM/incident/incdir_43257/+ASM_mman_3435_i43257.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2020-04-16T09:47:36.809662+02:00 Dumping diagnostic data in directory=[cdmp_20200416094736], requested by (instance=1, osid=3435 (MMAN)), summary=[incident=43257]. 2020-04-16T09:47:40.446580+02:00 PMON (ospid: 3422): terminating the instance due to ORA error 822 2020-04-16T09:47:40.453718+02:00 Cause - 'Instance is being terminated due to fatal process death (pid: 7, ospid: 3435, MMAN)' 2020-04-16T09:47:40.466726+02:00 System state dump requested by (instance=1, osid=3422 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_diag_3442.trc 2020-04-16T09:47:40.593505+02:00 Dumping diagnostic data in directory=[cdmp_20200416094740], requested by (instance=1, osid=3422 (PMON)), summary=[abnormal instance termination]. 2020-04-16T09:47:41.669396+02:00 Instance terminated by PMON, pid = 3422
It took 6 months to have the fix developed by Oracle and it works only for 19.6 RU. Meanwhile three more RUs came out (19.7, 19.8 and 19.9) without the fix. It will be included in the next RU 19.10 (support eng. told me). So technically speaking, I still need a fix for the latest RU I would like to use..
If you want to try the fix, you can download the interim patch 31580122 from MOS (it is specific for Oracle 19.6). Bear in mind this is a GRID patch; you will need to use opatchauto as root to apply it (chances are the README within the patch contains the wrong steps):
# opatchauto apply -oh $GRID_HOME
In any case, please please do your tests on a non production system before deploy it in production!
Enjoy this odd bug! 🙂