[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c251dcde-6c01-65a7-5745-c16a10d5fe1f@linux.microsoft.com>
Date: Mon, 31 Aug 2020 09:01:11 -0400
From: Vineeth Pillai <viremana@...ux.microsoft.com>
To: peterz@...radead.org
Cc: Julien Desfossez <jdesfossez@...italocean.com>,
Joel Fernandes <joelaf@...gle.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Aaron Lu <aaron.lwe@...il.com>,
Aubrey Li <aubrey.intel@...il.com>,
Dhaval Giani <dhaval.giani@...cle.com>,
Chris Hyser <chris.hyser@...cle.com>,
Nishanth Aravamudan <naravamudan@...italocean.com>,
mingo@...nel.org, tglx@...utronix.de, pjt@...gle.com,
torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
Phil Auld <pauld@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>, joel@...lfernandes.org,
vineeth@...byteword.org, Chen Yu <yu.c.chen@...el.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Agata Gruza <agata.gruza@...el.com>,
Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
graf@...zon.com, konrad.wilk@...cle.com, dfaggioli@...e.com,
rostedt@...dmis.org, derkling@...gle.com, benbjiang@...cent.com,
Vineeth Remanan Pillai <vpillai@...italocean.com>,
Aaron Lu <aaron.lu@...ux.alibaba.com>
Subject: Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and
scheduling.
On 8/29/20 3:47 AM, peterz@...radead.org wrote:
>> During hotplug stress test, we have noticed that while a sibling is in
>> pick_next_task, another sibling can go offline or come online. What
>> we have observed is smt_mask get updated underneath us even if
>> we hold the lock. From reading the code, looks like we don't hold the
>> rq lock when the mask is updated. This extra logic was to take care of that.
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
One issue that we observed was with the inconsistent view of smt_mask
in the three loops in pick_next_task can lead to corner cases. The first
loop
resets all coresched fields, second loop picks the task for each
siblings and
then if the sibling came online in the third loop, we IPI it and it
crashes for a
NULL core_pick. Similarly I think we might have issues if the sibling goes
offline in the last loop or sibling is offline/online in the pick loop.
It might be possible to do specific checks for core_pick in the loops to
fix the
corner cases above. But I was not sure if there were more and hence took
this
approach. I understand that cpumask_weight is expensive and will try to
fix it
using core_pick to fix the corner cases.
>
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
>> Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
>> detail the individual hotplug changes squashed into this patch:
>> https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
>> https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
I think there are couple of scenarios where pick_seq per sibling will be
useful. One is this hotplug case, where you need to pick only for the online
siblings and the offline siblings can come online async. Second case that
we have seen is, we don't need to mark pick for siblings when we pick a task
which is currently running. Marking the pick core wide will make the sibling
take the fast path and re-select the same task during a schedule event
due to preemption or its time slice expiry.
>> https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
>
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
>
> /me re-reads the hotplug code...
>
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.
Yes, we get called into schedule(for going idle) for an offline cpu and
it gets
confused. Also I think there could be problems while it comes online as
well,
like I mentioned above. We might IPI a sibling which just came online
with a
NULL core_pick. But I think we can fix it with specific checks for
core_pick.
I shall look into the smt mask update side of the hotplug again and see
if corner cases could be better handled there.
Thanks,
Vineeth
Powered by blists - more mailing lists