The first post on this topic was presenting the normal HP-UX timeshare scheduling, under which all processes will run. Its main objective was to give the rules for context switching (threads releasing CPUs and getting a CPU back), before comparing it to the timeshare no age scheduling.
The decaying priority scheme has not been altered since the very early HP-UX workstations, at a time when I assume data bases were a remote if not non existent consideration to the HP-UX designers.
As a thread gets weaker because of a decreasing priority, its probability of being switched out will increase. If this process holds a critical resource (an Oracle latch, not even mentioning the log writer), this context switch will cause resource contention should other threads wait for the same resource.
A context switch (you can see them with in the pswch/s column of sar -w) is a rather expensive task: the system scheduler makes a copy of all the switched out process registries and copy into a private area known as the UAREA. The reverse operation needs to be performed for the incoming thread: its UAREA will be copied in the CPU registries. Another problem will of course occurs when CPU switches also happen on top of thread switches: modified cache lines need to be invalidated and read lines reloaded on the new CPU. I once read that a context switch would cost around 10 microseconds (I did not verify it myself), which is far for negligible. The HP-UX internals manuals mentions 10000 CPU cycle, which would indeed be translated into 10 microseconds with a 1Ghz CPU.
So when does a context switch occurs? HP-UX considers two cases: forced and voluntary context switches. Within the Oracle context, voluntary context switches are most of the time disk I/Os. A forced context switch will occur at the end of a timeslice when a thread with an equal or higher priority is runnable, or when a process returns from a system call or trap and a higher-priority thread is runnable.
Oradebug helps diagnosing the LGWR forced and volontary switches:
SQL> oradebug setospid 2022
Oracle pid: 6, Unix process pid: 2022, image: oracle@mymachine12 (LGWR)
SQL> oradebug unlimit
SQL> oradebug procstat
SQL> oradebug tracefile_name
SQL> !more /oracle/product/10.2.0/admin/MYDB/bdump/mydb_lgwr_2022.trc
Voluntary context switches = 272
Involuntary context switches = 167
The sched no age policy was added in HP-UX11i: it is part of the 178-225 range (the same range as the normal time share scheduler user priorities), but a thread running in this scheduler will not experience any priority decay: its priority will remain constant over time although the thread may be timesliced. Some expected benefits:
- Less context switches overhead (lower cpu utilization)
- The cpu hungry transactions may complete faster in a CPU busy system (the less hungry transactions may run slower, but as they do not spend much time on the CPUs, it may not be noticeable).
- May help resolve an LGWR problem: the log writer inherently experiences a lot of voluntary context switches, which helps keeping a high priority. However, it may not be enough on CPU-busy systems.
As the root user, give the RTSCHED and RTPRIO privileges to the dba group setprivgrp dba RTSCHED RTPRIO. Create the /etc/privgroup file, if it does not exist, and add the following line to it: dba RTSCHED RTPRIO. Finally update the spfile/init with hpux_sched_noage=178 to 255. See rtsched(1).