lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Jul 2024 12:30:08 +0200
From: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
To: Matthias Schiffer <matthias.schiffer@...tq-group.com>,
 Linux regressions mailing list <regressions@...ts.linux.dev>,
 Markus Schneider-Pargmann <msp@...libre.com>
Cc: Marc Kleine-Budde <mkl@...gutronix.de>,
 Chandrasekar Ramakrishnan <rcsekar@...sung.com>,
 Vincent Mailhol <mailhol.vincent@...adoo.fr>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Tony Lindgren <tony@...mide.com>, Judith Mendez <jm@...com>,
 linux-can@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux@...tq-group.com
Subject: Re: Kernel hang caused by commit "can: m_can: Start/Cancel polling
 timer together with interrupts"

On 02.07.24 12:03, Matthias Schiffer wrote:
> On Tue, 2024-07-02 at 07:37 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 01.07.24 16:34, Markus Schneider-Pargmann wrote:
>>> On Mon, Jul 01, 2024 at 02:12:55PM GMT, Linux regression tracking (Thorsten Leemhuis) wrote:
>
>>> @Matthias: Thanks for debugging and sorry for breaking it. If you have a
>>> fix for this, let me know. I have a lot of work right now, so I am not
>>> sure when I will have a proper fix ready. But it is on my todo list.
>>
>> Thx. This made me wonder: is "revert the culprit to resolve this quickly
>> and reapply it later together with a fix" something that we should
>> consider if a proper fix takes some time? Or is this not worth it in
>> this case or extremely hard? Or would it cause a regression on it's own
>> for users of 6.9?
> 
> I think on 6.9 a revert is not easily possible (without reverting several other commits adding new
> features), but it should be considered for 6.6.
>> I don't think further regressions are possible by reverting, as on
6.6 the timer is only used for
> platforms without an m_can IRQ, and on these platforms the current behavior is "the kernel
> reproducibly deadlocks in atomic context", so there is not much room for making it worse.

Often Greg does not revert commits in a stable branches when they cause
the same problem in mainline. But I suspect in this case it is something
different. But I guess he would prefer to hear "please revert
887407b622f8e4 ("can: m_can: Start/Cancel polling timer together with
interrupts")" coming from Markus, hence:

Markus, if you agree that a revert from 6.6.y might be best, could you
simply ask for a revert in a reply to this mail while CCing Greg and the
stable list? tia!

Ciao, Thorsten

> Like Markus, I have writing a proper fix for this on my TODO list, but I'm not sure when I can get
> to it - hopefully next week.
> 
> Best regards,
> Matthias
> 
> 
> 
>>
>>>> On 18.06.24 18:12, Matthias Schiffer wrote:
>>>>> Hi Markus,
>>>>>
>>>>> we've found that recent kernels hang on the TI AM62x SoC (where no m_can interrupt is available and
>>>>> thus the polling timer is used), always a few seconds after the CAN interfaces are set up.
>>>>>
>>>>> I have bisected the issue to commit a163c5761019b ("can: m_can: Start/Cancel polling timer together
>>>>> with interrupts"). Both master and 6.6 stable (which received a backport of the commit) are
>>>>> affected. On 6.6 the commit is easy to revert, but on master a lot has happened on top of that
>>>>> change.
>>>>>
>>>>> As far as I can tell, the reason is that hrtimer_cancel() tries to cancel the timer synchronously,
>>>>> which will deadlock when called from the hrtimer callback itself (hrtimer_callback -> m_can_isr ->
>>>>> m_can_disable_all_interrupts -> hrtimer_cancel).
>>>>>
>>>>> I can try to come up with a fix, but I think you are much more familiar with the driver code. Please
>>>>> let me know if you need any more information.
>>>>>
>>>>> Best regards,
>>>>> Matthias
>>>>>
>>>>>
>>>
>>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ