linux-kernel - Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c251dcde-6c01-65a7-5745-c16a10d5fe1f@linux.microsoft.com>
Date:   Mon, 31 Aug 2020 09:01:11 -0400
From:   Vineeth Pillai <viremana@...ux.microsoft.com>
To:     peterz@...radead.org
Cc:     Julien Desfossez <jdesfossez@...italocean.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Dhaval Giani <dhaval.giani@...cle.com>,
        Chris Hyser <chris.hyser@...cle.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        mingo@...nel.org, tglx@...utronix.de, pjt@...gle.com,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>, joel@...lfernandes.org,
        vineeth@...byteword.org, Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Agata Gruza <agata.gruza@...el.com>,
        Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
        graf@...zon.com, konrad.wilk@...cle.com, dfaggioli@...e.com,
        rostedt@...dmis.org, derkling@...gle.com, benbjiang@...cent.com,
        Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Aaron Lu <aaron.lu@...ux.alibaba.com>
Subject: Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and
 scheduling.



On 8/29/20 3:47 AM, peterz@...radead.org wrote:
>> During hotplug stress test, we have noticed that while a sibling is in
>> pick_next_task, another sibling can go offline or come online. What
>> we have observed is smt_mask get updated underneath us even if
>> we hold the lock. From reading the code, looks like we don't hold the
>> rq lock when the mask is updated. This extra logic was to take care of that.
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
One issue that we observed was with the inconsistent view of smt_mask
in the three loops in pick_next_task can lead to corner cases. The first 
loop
resets all coresched fields, second loop picks the task for each 
siblings and
then if the sibling came online in the third loop, we IPI it and it 
crashes for a
NULL core_pick. Similarly I think we might have issues if the sibling goes
offline in the last loop or sibling is offline/online in the pick loop.

It might be possible to do specific checks for core_pick in the loops to 
fix the
corner cases above. But I was not sure if there were more and hence took 
this
approach. I understand that cpumask_weight is expensive and will try to 
fix it
using core_pick to fix the corner cases.

>
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
>> Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
>> detail the individual hotplug changes squashed into this patch:
>> https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
>> https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
I think there are couple of scenarios where pick_seq per sibling will be
useful. One is this hotplug case, where you need to pick only for the online
siblings and the offline siblings can come online async. Second case that
we have seen is, we don't need to mark pick for siblings when we pick a task
which is currently running. Marking the pick core wide will make the sibling
take the fast path and re-select the same task during a schedule event
due to preemption or its time slice expiry.

>> https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
>
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
>
> /me re-reads the hotplug code...
>
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.
Yes, we get called into schedule(for going idle) for an offline cpu and 
it gets
confused. Also I think there could be problems while it comes online as 
well,
like I mentioned above. We might IPI a sibling which just came online 
with a
NULL core_pick. But I think we can fix it with specific checks for 
core_pick.

I shall look into the smt mask update side of the hotplug again and see
if corner cases could be better handled there.

Thanks,
Vineeth