linux-kernel - Re: [PATCH] sched: fix sched-domain avg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTin4aQ7Gc+h0YCPo=Eokc2iz06WNbw@mail.gmail.com>
Date:	Fri, 8 Apr 2011 12:29:33 -0700
From:	Ken Chen <kenchen@...gle.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	mingo@...e.hu, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: fix sched-domain avg_load calculation.

On Fri, Apr 8, 2011 at 4:15 AM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> On Thu, 2011-04-07 at 17:23 -0700, Ken Chen wrote:
>> In function find_busiest_group(), the sched-domain avg_load isn't
>> calculated at all if there is a group imbalance within the domain.
>> This will cause erroneous imbalance calculation.  The reason is
>> that calculate_imbalance() sees sds->avg_load = 0 and it will dump
>> entire sds->max_load into imbalance variable, which is used later
>> on to migrate entire load from busiest CPU to the puller CPU. It
>> has two really bad effect:
>>
>> 1. stampede of task migration, and they won't be able to break out
>>    of the bad state because of positive feedback loop: large load
>>    delta -> heavier load migration -> larger imbalance and the cycle
>>    goes on.
>>
>> 2. severe imbalance in CPU queue depth.  This causes really long
>>    scheduling latency blip which affects badly on application that
>>    has tight latency requirement.
>>
>> The fix is to have kernel calculate domain avg_load in both cases.
>> This will ensure that imbalance calculation is always sensible and
>> the target is usually half way between busiest and puller CPU.
>
> Indeed so, it looks like I broke that in 866ab43efd32. Out of curiosity,
> what kind of workload did you observe this on?

This was observed on application that serves websearch query.  There
were uneven CPU queue depth in the system, which leads to long query
latency tail.  The latency tail were both high in occurring frequency
as well as streched out in time.

With this fix, both server throughput and latency response were improved.

- Ken
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/