Christian Bilien's Oracle performance and tuning blog

A short, but (hopefully) interesting walk in the HP-UX magic garden


While many in the Oracle blogosphere seemed to be part of a huge party in San Francisco, I was spending some time researching a LGWR scheduling issues on HP-UX for a customer. While working on it, I thought that a short walk in the magic garden may be of interest to some readers (the Unix geeks will remember in the 90”s “The magic garden explained: The Internals of Unix System V Release 4”. It is amazingly still sold by Amazon though second hand copies only are available).

I already blogged on HP-UX scheduling :hpux_sched_noage and hp-ux load balancing on SMPs.

I work with q4, a debugger I use extensively when I teach at HP HP-UX performance or HP-UX internals. This first post on the HP-UX internals will take us on a walk into the process and thread structure, where I’ll highlight the areas of interest to scheduling.

Let’s first copy the kernel to another file before preparing it for debugging:

$cp /stand/vmunix /stand/vmunix.q4
$q4pxdb /stand/vmunix.q4

q4 can then be invoked:

$q4 /dev/mem/stand/vmunix.q4

(or use ied to enable command recall and editing):

$ied -h $HOME/.q4_history q4 /dev/mem /stand/vmunix.q4)

@(#) q4 $Revision: 11.X B.11.23l Wed Jun 23 18:05:11 PDT 2004$ 0
see help topic q4live regarding use of /dev/mem
Reading kernel symbols …
Reading debug information…
Reading data types …
Initialized ia64 address translator …
Initializing stack tracer for EM…

Then let’s load the whole proc structures from the proc table. Unlike older versions, nproc is not the size of hash table but rather the maximum size of a linked list.

q4> load struct proc from proc_list max nproc next p_factp

Here are some fields of interest to the scheduling policy:

p_stat  SINUSE => proc entry in use
p_pri  0x298 => default priority
p_nice  0x14 => nice value
p_ticks   0xf8113a => Number of clock ticks charged to this process
p_schedpolicy  0x2 => Scheduling policy (real time rtsched or rtprio, etc.)

p_pri and p_nice are seen in ps, top, glance, etc.

p_firstthreadp  0xe0000001345d10c0  and p_lastthreadp  0xe0000001345d10c0 

are the backward and forward pointer links to the other threads of the process. They are identical here because the process is single-threaded.

We’ll just keep one process to watch:

q4> keep p_pid == 5252

and we can now load this process thread list:

q4> load struct kthread from kthread_list max nkthread next kt_factp

Here are some of the fields I know of related to scheduling or needed for accessing the next thread:

kt_link  0 => forward run/sleep queue link
kt_rlink  0 => backward run queue link
kt_procp  0xe00000012ffd2000 => a pointer to its proc  structure
kt_nextp  0 => next thread in the same process (no next thread here)
kt_prevp  0 => previous thread in the same process (no previous thread here)
kt_wchan  0xe0000001347d64a8 => sleep wait channel pointer
kt_stat  TSSLEEP => thread status. Not much to do !
kt_cpu  0 => Charge the running thread with cpu time accumulated
kt_spu  0 => the spu number of the run queue that the thread is currently on
kt_spu_wanted  0 => the spu it would like to be on (a context switch will happen if kt_spu  <> kt_spu_wanted)
kt_schedpolicy  0x2 => scheduling policy for the thread
kt_ticksleft   0xa => number of ticks left before a voluntary timeslice will be requested

kt_usrpri  0x2b2
 and  kt_pri  0x29a

are equal while the thread are running in the timeshare scheduling mode. As the thread is sleeping on priority kt_pri, kt_usrpri will contain the re-calculated priorities. On wakeup, the value of kt_usrpri is copied to kt_pri.