linux-kernel - Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <81b2288a-579d-8dd1-f179-d672cf1edd68@oracle.com>
Date:   Mon, 1 Jul 2019 17:01:07 -0700
From:   Subhra Mazumdar <subhra.mazumdar@...cle.com>
To:     Patrick Bellasi <patrick.bellasi@....com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com, tglx@...utronix.de,
        steven.sistare@...cle.com, dhaval.giani@...cle.com,
        daniel.lezcano@...aro.org, vincent.guittot@...aro.org,
        viresh.kumar@...aro.org, tim.c.chen@...ux.intel.com,
        mgorman@...hsingularity.net, Paul Turner <pjt@...gle.com>,
        riel@...riel.com, morten.rasmussen@....com
Subject: Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path


On 7/1/19 6:55 AM, Patrick Bellasi wrote:
> On 01-Jul 11:02, Peter Zijlstra wrote:
>> On Wed, Jun 26, 2019 at 06:29:12PM -0700, subhra mazumdar wrote:
>>> Hi,
>>>
>>> Resending this patchset, will be good to get some feedback. Any suggestions
>>> that will make it more acceptable are welcome. We have been shipping this
>>> with Unbreakable Enterprise Kernel in Oracle Linux.
>>>
>>> Current select_idle_sibling first tries to find a fully idle core using
>>> select_idle_core which can potentially search all cores and if it fails it
>>> finds any idle cpu using select_idle_cpu. select_idle_cpu can potentially
>>> search all cpus in the llc domain. This doesn't scale for large llc domains
>>> and will only get worse with more cores in future.
>>>
>>> This patch solves the scalability problem by:
>>>   - Setting an upper and lower limit of idle cpu search in select_idle_cpu
>>>     to keep search time low and constant
>>>   - Adding a new sched feature SIS_CORE to disable select_idle_core
>>>
>>> Additionally it also introduces a new per-cpu variable next_cpu to track
>>> the limit of search so that every time search starts from where it ended.
>>> This rotating search window over cpus in LLC domain ensures that idle
>>> cpus are eventually found in case of high load.
>> Right, so we had a wee conversation about this patch series at OSPM, and
>> I don't see any of that reflected here :-(
>>
>> Specifically, given that some people _really_ want the whole L3 mask
>> scanned to reduce tail latency over raw throughput, while you guys
>> prefer the other way around, it was proposed to extend the task model.
>>
>> Specifically something like a latency-nice was mentioned (IIRC) where a
> Right, AFAIR PaulT suggested to add support for the concept of a task
> being "latency tolerant": meaning we can spend more time to search for
> a CPU and/or avoid preempting the current task.
>
Wondering if searching and preempting needs will ever be conflicting?
Otherwise sounds like a good direction to me. For the searching aspect, can
we map latency nice values to the % of cores we search in select_idle_cpu?
Thus the search cost can be controlled by latency nice value. But the issue
is if more latency tolerant workloads set to less search, we still need
some mechanism to achieve good spread of threads. Can we keep the sliding
window mechanism in that case? Also will latency nice do anything for
select_idle_core and select_idle_smt?