linux-kernel - High priority tasks break SMP balancer?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071109223417.GB16250@vmware.com>
Date:	Fri, 9 Nov 2007 14:34:17 -0800
From:	Micah Dowty <micah@...are.com>
To:	linux-kernel@...r.kernel.org
Subject: High priority tasks break SMP balancer?

I've been investigating a problem recently, in which N runnable
CPU-bound tasks on an N-way machine run on only N-1 CPUs. The
remaining CPU is almost 100% idle. I have seen it occur with both the
CFS and O(1) schedulers.

I've traced this down to what seems to be a quirk in the SMP balancer,
whereby a high-priority thread which spends most of its time sleeping
can artificially inflate the CPU load average calculated for one
processor. Most of the time this CPU is idle (nr_running==0) yet its
CPU load average is much higher than that of any other CPU.

Please find attached a sample program which demonstrates this
behaviour on a 2-way SMP machine. It creates three threads: two are
CPU bound and run at the default priority, the third spends most of
its time sleeping and runs at an elevated priority. It wakes up
frequently (using /dev/rtc) and randomly generates some CPU load.

On my machine (2-way Opteron with a vanilla 2.6.23.1 kernel) this test
program will reliably put the scheduler into a state where one CPU has
both of the busy-looping processes in its runqueue, and the other CPU
is usually idle. The usually-idle CPU will have a very high cpu_load,
as reported by /proc/sched_debug.

Your mileage may vary. On some machines, this test program will only
enter the "bad" state for a few seconds. Sometimes we bounce back and
forth between good and bad states every few seconds. In all cases,
removing the priority elevation fixes the balancing problem.

Is this a behaviour any of the scheduler developers are aware of? I
would be very greatful if anyone could shed some light on the root
cause behind the inflated cpu_load average. If this turns out to be a
real bug, I would be happy to work on a patch.

Thanks in advance,
Micah Dowty

View attachment "priosched.c" of type "text/plain" (4076 bytes)