lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 1 Jul 2024 16:34:18 +0200
From: Markus Schneider-Pargmann <msp@...libre.com>
To: Linux regressions mailing list <regressions@...ts.linux.dev>
Cc: Matthias Schiffer <matthias.schiffer@...tq-group.com>, 
	Marc Kleine-Budde <mkl@...gutronix.de>, Chandrasekar Ramakrishnan <rcsekar@...sung.com>, 
	Vincent Mailhol <mailhol.vincent@...adoo.fr>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Tony Lindgren <tony@...mide.com>, Judith Mendez <jm@...com>, 
	linux-can@...r.kernel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux@...tq-group.com
Subject: Re: Kernel hang caused by commit "can: m_can: Start/Cancel polling
 timer together with interrupts"

On Mon, Jul 01, 2024 at 02:12:55PM GMT, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing the regression list, as it should be in the loop for regressions:
> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> 
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Hmm, looks like there was not even a single reply to below regression
> report. But also seens Markus hasn't posted anything archived on Lore
> since about three weeks now, so he might be on vacation.
> 
> Marc, do you might have an idea what's wrong with the culprit? Or do we
> expected Markus to be back in action soon?

Great, ping here.

@Matthias: Thanks for debugging and sorry for breaking it. If you have a
fix for this, let me know. I have a lot of work right now, so I am not
sure when I will have a proper fix ready. But it is on my todo list.

Best,
Markus

> 
> Ciao, Thorsten
> 
> On 18.06.24 18:12, Matthias Schiffer wrote:
> > Hi Markus,
> > 
> > we've found that recent kernels hang on the TI AM62x SoC (where no m_can interrupt is available and
> > thus the polling timer is used), always a few seconds after the CAN interfaces are set up.
> > 
> > I have bisected the issue to commit a163c5761019b ("can: m_can: Start/Cancel polling timer together
> > with interrupts"). Both master and 6.6 stable (which received a backport of the commit) are
> > affected. On 6.6 the commit is easy to revert, but on master a lot has happened on top of that
> > change.
> > 
> > As far as I can tell, the reason is that hrtimer_cancel() tries to cancel the timer synchronously,
> > which will deadlock when called from the hrtimer callback itself (hrtimer_callback -> m_can_isr ->
> > m_can_disable_all_interrupts -> hrtimer_cancel).
> > 
> > I can try to come up with a fix, but I think you are much more familiar with the driver code. Please
> > let me know if you need any more information.
> > 
> > Best regards,
> > Matthias
> > 
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ