[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b7ad9cb9-4fed-4035-a544-a4d827662c92@amd.com>
Date: Fri, 27 Dec 2024 10:24:26 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chengming Zhou <chengming.zhou@...ux.dev>, Johannes Weiner
<hannes@...xchg.org>, Suren Baghdasaryan <surenb@...gle.com>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
<linux-kernel@...r.kernel.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Chengming Zhou
<zhouchengming@...edance.com>, Muchun Song <muchun.song@...ux.dev>, "Gautham
R. Shenoy" <gautham.shenoy@....com>, Chuyi Zhou <zhouchuyi@...edance.com>
Subject: Re: [PATCH] psi: Fix race when task wakes up before
psi_sched_switch() adjusts flags
On 12/27/2024 10:10 AM, Chengming Zhou wrote:
> On 2024/12/27 12:10, K Prateek Nayak wrote:
>> Hello there,
>>
> [...]
>>>
>>> Just made a quick fix and tested passed using your script.
>>
>> Thank you! The diff seems to be malformed as a result of whitespaces but
>> I was able to test if by recreating the diff. Feel free to add:
>>
>> Reported-by: K Prateek Nayak <kprateek.nayak@....com>
>> Closes: https://lore.kernel.org/lkml/20241226053441.1110-1- kprateek.nayak@....com/
>> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>>
>> If you can give your sign off, I could add a commit message and send it on
>> your behalf too.
>
> Great, thanks for your time!
>
> Signed-off-by: Chengming Zhou <chengming.zhou@...ux.dev>
Thank you!
>
>>
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 3e5a6bf587f9..065ac76c47f9 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -6641,7 +6641,6 @@ static void __sched notrace __schedule(int sched_mode)
>>> * as a preemption by schedule_debug() and RCU.
>>> */
>>> bool preempt = sched_mode > SM_NONE;
>>> - bool block = false;
>>> unsigned long *switch_count;
>>> unsigned long prev_state;
>>> struct rq_flags rf;
>>> @@ -6702,7 +6701,7 @@ static void __sched notrace __schedule(int sched_mode)
>>> goto picked;
>>> }
>>> } else if (!preempt && prev_state) {
>>> - block = try_to_block_task(rq, prev, prev_state);
>>> + try_to_block_task(rq, prev, prev_state);
>>> switch_count = &prev->nvcsw;
>>> }
>>>
>>> @@ -6748,7 +6747,8 @@ static void __sched notrace __schedule(int sched_mode)
>>>
>>> migrate_disable_switch(rq, prev);
>>> psi_account_irqtime(rq, prev, next);
>>> - psi_sched_switch(prev, next, block);
>>> + psi_sched_switch(prev, next, !task_on_rq_queued(prev) ||
>>> + prev->se.sched_delayed);
>>>
>>> trace_sched_switch(preempt, prev, next, prev_state);
>>>
>>> diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
>>> index 8ee0add5a48a..65efe45fcc77 100644
>>> --- a/kernel/sched/stats.h
>>> +++ b/kernel/sched/stats.h
>>> @@ -150,7 +150,7 @@ static inline void psi_enqueue(struct task_struct *p, int flags)
>>> set = TSK_RUNNING;
>>> if (p->in_memstall)
>>> set |= TSK_MEMSTALL | TSK_MEMSTALL_RUNNING;
>>> - } else {
>>> + } else if (!task_on_cpu(task_rq(p), p)) {
>>
>> One small nit. here
>>
>> If the task is on CPU at this point, both set and clear are 0 but
>> psi_task_change() is still called and I don't see it bailing out if it
>> doesn't have to adjust any flags.
>
> Yes.
>
>>
>> Can we instead just do an early return if task_on_cpu(task_rq(p), p)
>> returns true? I've tested that version too and I haven't seen any
>> splats.
>
> I thought it's good to preserve the current flow that:
>
> if (restore)
> return;
>
> if (migrate)
> ...
> else if (wakeup)
> ...
>
> As for early return when `task_on_cpu()`, it looks right to me.
Thank you for your feedback. I agree the current approach fits the flow
well but the call to psi_task_change() which does a whole lot although
the flags remain same (like seq_count updates, etc.) seems unnecessary.
> Anyway, it's not a migrate or wakeup from PSI POV.
>
> Thanks!
>
>>
>>> /* Wakeup of new or sleeping task */
>>> if (p->in_iowait)
>>> clear |= TSK_IOWAIT;
>>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists