linux-kernel - Re: High priority tasks break SMP balancer?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071115191408.GA4914@vmware.com>
Date:	Thu, 15 Nov 2007 11:14:08 -0800
From:	Micah Dowty <micah@...are.com>
To:	Kyle Moffett <mrmacman_g4@....com>
Cc:	Cyrus Massoumi <cyrusm@....net>,
	LKML Kernel <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>, Andrew Morton <akpm@...l.org>,
	Mike Galbraith <efault@....de>,
	Paul Menage <menage@...gle.com>,
	Christoph Lameter <clameter@....com>
Subject: Re: High priority tasks break SMP balancer?

On Thu, Nov 15, 2007 at 01:48:20PM -0500, Kyle Moffett wrote:
>> In general, boosting the MAINTHREAD_PRIORITY even more and increasing the 
>> WAKE_HZ should exaggerate the problem. These parameters reproduce the 
>> problem very reliably on my system:
>>
>> #define NUM_BUSY_THREADS            2
>> #define MAINTHREAD_PRIORITY       -20
>> #define MAINTHREAD_WAKE_HZ       1024
>> #define MAINTHREAD_LOAD_PERCENT     5
>> #define MAINTHREAD_LOAD_CYCLES      2
>
> Well from these statistics; if you are requesting wakeups that often then 
> it is probably *not* correct to try to move another thread to that CPU in 
> the mean-time.  Essentially the migration cost will likely far outweigh the 
> advantage of letting it run a little bit of extra time, and in addition it 
> will dump out cache from the high-priority thread.  As per the description 
> I think that an increased a priority and increased WAKE_HZ will certainly 
> cause the "problem" to occur more, simply because it reduces the time 
> between wakeups of the high-priority process and makes it less helpful to 
> migrate another process over to that CPU during the sleep periods.  This 
> will also depend on your hardware and possibly other configuration 
> parameters.

The real problem, though, is that the high priority thread only needs
a total of a few percent worth of CPU time. The behaviour which I'm
reporting as a potential bug is that this process which needs very
little CPU time is effectively getting an entire CPU to itself,
despite the fact that other CPU-bound threads could benefit from
having time on that CPU.

One could argue that a thread with a high enough priority *should* get
a CPU all to itself even if it won't use that CPU- but that isn't the
behaviour I want. If this is in fact the intended effect of having a
high-priority thread wake very frequently, I should start a different
discussion about how to solve my specific problem without the use of
elevated priorities :)

I don't have any reason to believe, though, that this behaviour was
intentional. I just finished my "git bisect" run, and I found the
first commit after which I can reproduce the problem:

c9819f4593e8d052b41a89f47140f5c5e7e30582 is first bad commit
commit c9819f4593e8d052b41a89f47140f5c5e7e30582
Author: Christoph Lameter <clameter@....com>
Date:   Sun Dec 10 02:20:25 2006 -0800

    [PATCH] sched: use softirq for load balancing
    
    Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
    softirq.
    
    We calculate the earliest time for each layer of sched domains to be rescanned
    (this is the rescan time for idle) and use the earliest of those to schedule
    the softirq via a new field "next_balance" added to struct rq.
    
    Signed-off-by: Christoph Lameter <clameter@....com>
    Cc: Peter Williams <pwil3058@...pond.net.au>
    Cc: Nick Piggin <nickpiggin@...oo.com.au>
    Cc: Christoph Lameter <clameter@....com>
    Cc: "Siddha, Suresh B" <suresh.b.siddha@...el.com>
    Cc: "Chen, Kenneth W" <kenneth.w.chen@...el.com>
    Acked-by: Ingo Molnar <mingo@...e.hu>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@...l.org>
    Signed-off-by: Linus Torvalds <torvalds@...l.org>

:040000 040000 2b66e4b500403869cf5925367b1ddcbb63948d01 33a27ddb470473f129a78ce310cfc59a605e173b M      include
:040000 040000 939b60deffb2af2689b4aab63e21ff6c98a3b782 dd3bf32eea9556d5a099db129adc048396368adc M      kernel

> I'm not really that much of an expert in this particular area, though, so 
> it's entirely possible that one of the above-mentioned scheduler 
> head-honchos will poke holes in my argument and give a better explanation 
> or a possible patch.

Thanks! I've also CC'ed Christoph.

For reference, the exact test I used with git-bisect is attached. The
C program (priosched) starts two busy-looping threads and a
high-priority high-frequency thread which uses relatively little
CPU. The Python program repeatedly starts the C program, runs it for a
half second, and measures the resulting imbalance in CPU usage. On
kernels prior to the above commit, this reports values within about
10% of 1.0. On later kernels, it crashes within a couple iterations
due to a divide-by-zero error :)

Thanks again,
--Micah

View attachment "test-priosched.py" of type "text/plain" (711 bytes)

View attachment "priosched.c" of type "text/plain" (4076 bytes)