lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <81b5bab9-1347-a2cf-dcd3-2ec1e451cef3@arm.com>
Date:   Tue, 5 Apr 2022 17:16:38 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Peter Zijlstra <peterz@...radead.org>,
        "T.J. Alumbaugh" <talumbau@...omium.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        joel@...lfernandes.org
Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 05/04/2022 09:48, Peter Zijlstra wrote:
> On Mon, Apr 04, 2022 at 04:17:54PM -0400, T.J. Alumbaugh wrote:
>>
>> On 3/29/22 17:22, Steven Rostedt wrote:
>>> On Mon, 21 Mar 2022 13:30:37 -0400
>>> Steven Rostedt <rostedt@...dmis.org> wrote:
>>>
>>>> On Wed, 16 Mar 2022 22:03:41 +0100
>>>> Peter Zijlstra <peterz@...radead.org> wrote:
>>>>
>>>>> Does something like the below (untested in the extreme) help?
>>>> Hi Peter,
>>>>
>>>> This has been tested extensively by the ChromeOS team and said that it does
>>>> appear to fix the problem.
>>>>
>>>> Could you get this into mainline, and tag it for stable so that it can be
>>>> backported to the appropriate stable releases?
>>>>
>>>> Thanks for the fix!
>>>>
>>> Hi Peter,
>>>
>>> I just don't want you to forget about this :-)
>>>
>>> -- Steve
>>>
>> Hi Peter,
>>
>> Just a note that if/when you send this out as a patch, feel free to add:
>>
>> Tested-by: T.J. Alumbaugh <talumbau@...omium.org>
> 
> https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-ass.net

I still wonder if this issue happened on a system w/o:

     565790d28b1e ("sched: Fix balance_callback()")

Maybe chromeos-5.10 or earlier? In this case applying 565790d28b1e could
fix it as well.

The reason why I think the original issue happened on a system w/o
565790d28b1e is the call-stack in:

https://lkml.kernel.org/r/20220315174606.02959816@gandalf.local.home

[56064.673346] Call Trace:
[56064.676066]  dump_stack+0xb9/0x117
[56064.679861]  ? print_usage_bug+0x2af/0x2c2
[56064.684434]  mark_lock_irq+0x25e/0x27d
[56064.688618]  mark_lock+0x11a/0x16c
[56064.692412]  mark_held_locks+0x57/0x87
[56064.696595]  ? _raw_spin_unlock_irq+0x2c/0x40
[56064.701460]  lockdep_hardirqs_on+0xb1/0x19d
[56064.706130]  _raw_spin_unlock_irq+0x2c/0x40
[56064.710799]  sched_core_balance+0x8a/0x4af
[56064.715369]  ? __balance_callback+0x1f/0x9a        <--- !!!
[56064.720030]  __balance_callback+0x4f/0x9a
[56064.724506]  rt_mutex_setprio+0x43a/0x48b
[56064.728982]  task_blocks_on_rt_mutex+0x14d/0x1d5

has __balance_callback().

565790d28b1e changes __balance_callback() to __balance_callbacks()
                                                               ^

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ