linux-kernel - Re: [PATCH] sched: fix rq lock recursion issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <649352a9-5cd4-cce6-62ae-e3f4aac18eef@quicinc.com>
Date:   Tue, 5 Jul 2022 20:46:54 -0700
From:   Satya Durga Srinivasu Prabhala <quic_satyap@...cinc.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Qais Yousef <qais.yousef@....com>
CC:     <mingo@...hat.com>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <bristot@...hat.com>, <vschneid@...hat.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: fix rq lock recursion issue


On 7/1/22 1:33 AM, Peter Zijlstra wrote:
> On Thu, Jun 30, 2022 at 10:53:10PM +0100, Qais Yousef wrote:
>> Hi Satya
>>
>> On 06/24/22 00:42, Satya Durga Srinivasu Prabhala wrote:
>>> Below recursion is observed in a rare scenario where __schedule()
>>> takes rq lock, at around same time task's affinity is being changed,
>>> bpf function for tracing sched_switch calls migrate_enabled(),
>>> checks for affinity change (cpus_ptr != cpus_mask) lands into
>>> __set_cpus_allowed_ptr which tries acquire rq lock and causing the
>>> recursion bug.
>>>
>>> Fix the issue by switching to preempt_enable/disable() for non-RT
>>> Kernels.
>> Interesting bug. Thanks for the report. Unfortunately I can't see this being
>> a fix as it just limits the bug visibility to PREEMPT_RT kernels, but won't fix
>> anything, no? ie: Kernels compiled with PREEMPT_RT will still hit this failure.
> Worse, there's !RT stuff that grew to rely on the preemptible migrate
> disable stuff, so this actively breaks things.
Sorry about that. I'm cross checking further on ways to repro issue easily.
>> I'm curious how the race with set affinity is happening. I would have thought
>> user space would get blocked as __schedule() will hold the rq lock.
>>
>> Do you have more details on that?
> Yeah, I'm not seeing how this works either, in order for
> migrate_enable() to actually call __set_cpus_allowed_ptr(), it needs to
> have done migrate_disable() *before* schedule, schedule() will then have
> to call migrate_disable_swich(), and *then* migrate_enable() does this.
>
> However, if things are nicely balanced (as they should be), then
> trace_call_bpf() using migrate_disable()/migrate_enable() should never
> hit this path.
>
> If, OTOH, migrate_disable() was called prior to schedule() and we did do
> migrate_disable_switch(), then it should be impossible for the
> tracepoint/bpf stuff to reach p->migration_disabled == 0.
>
Thanks for explanation. Will cross check further on these points.
>>> -010 |spin_bug(lock = ???, msg = ???)
>>> -011 |debug_spin_lock_before(inline)
>>> -011 |do_raw_spin_lock(lock = 0xFFFFFF89323BB600)
>>> -012 |_raw_spin_lock(inline)
>>> -012 |raw_spin_rq_lock_nested(inline)
>>> -012 |raw_spin_rq_lock(inline)
>>> -012 |task_rq_lock(p = 0xFFFFFF88CFF1DA00, rf = 0xFFFFFFC03707BBE8)
>>> -013 |__set_cpus_allowed_ptr(inline)
>>> -013 |migrate_enable()
>>> -014 |trace_call_bpf(call = ?, ctx = 0xFFFFFFFDEF954600)
>>> -015 |perf_trace_run_bpf_submit(inline)
>>> -015 |perf_trace_sched_switch(__data = 0xFFFFFFE82CF0BCB8, preempt = FALSE, prev = ?, next = ?)
>>> -016 |__traceiter_sched_switch(inline)
>>> -016 |trace_sched_switch(inline)
>>> -016 |__schedule(sched_mode = ?)
>>> -017 |schedule()
>>> -018 |arch_local_save_flags(inline)
>>> -018 |arch_irqs_disabled(inline)
>>> -018 |__raw_spin_lock_irq(inline)
>>> -018 |_raw_spin_lock_irq(inline)
>>> -018 |worker_thread(__worker = 0xFFFFFF88CE251300)
>>> -019 |kthread(_create = 0xFFFFFF88730A5A80)
>>> -020 |ret_from_fork(asm)
> This doesn't clarify much. Please explain how things got to be
> unbalanced, don't ever just make a problem dissapear like this without
> understanding what the root cause is, that'll just get your reputation
> sullied.
Agreed, thanks for the comments and suggestion. Yes, I'm trying to cross
check further and find ways to repro the issue. Will get back once I find
a better way to handle the issue. I should have just tried to get
comments/feedback on the issue first instead proposing fix. Lesson 
learnt :)