[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a80da288-4697-28eb-ee30-9d8ef10738f3@applied-asynchrony.com>
Date: Mon, 11 Apr 2022 10:05:19 +0200
From: Holger Hoffstätte <holger@...lied-asynchrony.com>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: Qais Yousef <qais.yousef@....com>, linux-kernel@...r.kernel.org,
linux-tip-commits@...r.kernel.org,
Abhijeet Dharmapurikar <adharmap@...cinc.com>,
Valentin Schneider <valentin.schneider@....com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
"Steven Rostedt (Google)" <rostedt@...dmis.org>, x86@...nel.org,
stable@...r.kernel.org
Subject: Re: [tip: sched/core] sched/tracing: Don't re-read p->state when
emitting sched_switch event
On 2022-04-11 09:28, Greg KH wrote:
> On Mon, Apr 11, 2022 at 09:18:19AM +0200, Holger Hoffstätte wrote:
>> On 2022-04-11 01:22, Holger Hoffstätte wrote:
>>> On 2022-04-11 00:06, Qais Yousef wrote:
>>>> On 04/10/22 00:38, Qais Yousef wrote:
>>>>> On 03/08/22 18:51, Qais Yousef wrote:
>>>>>> On 03/08/22 19:10, Greg KH wrote:
>>>>>>> On Tue, Mar 08, 2022 at 06:02:40PM +0000, Qais Yousef wrote:
>>>>>>>> +CC stable
>>>>>>>>
>>>>>>>> On 03/01/22 15:24, tip-bot2 for Valentin Schneider wrote:
>>>>>>>>> The following commit has been merged into the sched/core branch of tip:
>>>>>>>>>
>>>>>>>>> Commit-ID: fa2c3254d7cfff5f7a916ab928a562d1165f17bb
>>>>>>>>> Gitweb: https://git.kernel.org/tip/fa2c3254d7cfff5f7a916ab928a562d1165f17bb
>>>>>>>>> Author: Valentin Schneider <valentin.schneider@....com>
>>>>>>>>> AuthorDate: Thu, 20 Jan 2022 16:25:19
>>>>>>>>> Committer: Peter Zijlstra <peterz@...radead.org>
>>>>>>>>> CommitterDate: Tue, 01 Mar 2022 16:18:39 +01:00
>>>>>>>>>
>>>>>>>>> sched/tracing: Don't re-read p->state when emitting sched_switch event
>>>>>>>>>
>>>>>>>>> As of commit
>>>>>>>>>
>>>>>>>>> c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
>>>>>>>>>
>>>>>>>>> the following sequence becomes possible:
>>>>>>>>>
>>>>>>>>> p->__state = TASK_INTERRUPTIBLE;
>>>>>>>>> __schedule()
>>>>>>>>> deactivate_task(p);
>>>>>>>>> ttwu()
>>>>>>>>> READ !p->on_rq
>>>>>>>>> p->__state=TASK_WAKING
>>>>>>>>> trace_sched_switch()
>>>>>>>>> __trace_sched_switch_state()
>>>>>>>>> task_state_index()
>>>>>>>>> return 0;
>>>>>>>>>
>>>>>>>>> TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
>>>>>>>>> the trace event.
>>>>>>>>>
>>>>>>>>> Prevent this by pushing the value read from __schedule() down the trace
>>>>>>>>> event.
>>>>>>>>>
>>>>>>>>> Reported-by: Abhijeet Dharmapurikar <adharmap@...cinc.com>
>>>>>>>>> Signed-off-by: Valentin Schneider <valentin.schneider@....com>
>>>>>>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>>>>>>>> Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>
>>>>>>>>> Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
>>>>>>>>
>>>>>>>> Any objection to picking this for stable? I'm interested in this one for some
>>>>>>>> Android users but prefer if it can be taken by stable rather than backport it
>>>>>>>> individually.
>>>>>>>>
>>>>>>>> I think it makes sense to pick the next one in the series too.
>>>>>>>
>>>>>>> What commit does this fix in Linus's tree?
>>>>>>
>>>>>> It should be this one: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
>>>>>
>>>>> Should this be okay to be picked up by stable now? I can see AUTOSEL has picked
>>>>> it up for v5.15+, but it impacts v5.10 too.
>>>>
>>>> commit: fa2c3254d7cfff5f7a916ab928a562d1165f17bb
>>>> subject: sched/tracing: Don't re-read p->state when emitting sched_switch event
>>>>
>>>> This patch has an impact on Android 5.10 users who experience tooling breakage.
>>>> Is it possible to include in 5.10 LTS please?
>>>>
>>>> It was already picked up for 5.15+ by AUTOSEL and only 5.10 is missing.
>>>>
>>>
>>> https://lore.kernel.org/stable/Yk2PQzynOVOzJdPo@kroah.com/
>>>
>>> However, since then further investigation (still in progress) has shown that this
>>> may have been the fault of the tool in question, so if you can verify that tracing
>>> sched still works for you with this patch in 5.15.x then by all means
>>> let's merge it.
>>
>> So it turns out the lockup is indeed the fault of the tool, which contains multiple
>> kernel-version dependent tracepoint definitions and now fails with this
>> patch.
>
> What tools is this?
sysdig - which uses a helper kernel module which accesses tracepoints, but of course
(as I just found) with copypasta'd TP definitions, which broke with this patch due to
the additional parameter in the function signature. It's been prone to breakage forever
because of a lack of a stable kernel ABI.
Took me a while to find/figure out, but IMHO better safe than sorry. We've had
autoselected scheduler patches before that looked fine but really were not.
>
>> Greg, please re-enqueue this patch where necessary (5.10, 5.15+)
>
> If I queue it up again, will the tools keep breaking?
Yes, but that's their problem with an out-of-tree module; a few more #ifdefs
are not going to make a big difference.
thanks
Holger
Powered by blists - more mailing lists