The first post I wrote on CPU scheduling was describing the circumstances in which thread stealing would be considered. This post will go further down the road: once there is enough cpu idleness, or all CPUs are starving threads, context switches may occur. This post describes the rules enforced by the HP-UX scheduler in versions 11.11 and later.
Before digging into the subject, locality domains (LDOM) should be explained. A locality domain is basically a cell. A cell is made of four single or dual core processors, as well as its own memory. The reason locality domain exist is the inter cell bus latency which may greatly impact memory access time. I’ll write a post one day about HP-UX partitioning which will go a bit more into details.
- A mundane_balance() iteration is run within each LDOM. Each processor is assigned a score based on load average AND starvation (remember from post 1 that starvation occurs when a thread assigned to a given processor has not been running for ‘a long time’). According to the HP system internals course, starvation is given more importance than load average, which makes sense as a cpu hog will be able to run 80 to 100ms before giving up the processor to another thread. In any case, an idle processor is always one of the best processors.
- In 11.22, the locality domain balancer is called to potentially move a thread from one domain to another.
The outcome is a pair of “best” and “worst” processors. If the pair has lightly loaded cpus, with a load average of less than 0.2, the system is considered to be well balanced and nothing is done. A thread running on the “worst” processor is otherwise selected (it must not be a real time or locked thread), removed from the run queue and inserted into the “best” processor run queue.
How is this “next” thread selected?
The selection is based on the virtual address of the kthread structure: the purpose of the algorithm is to ensure each thread is cycled through.