lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241216141303.2zr5klbgua55agkx@thinkpad>
Date: Mon, 16 Dec 2024 19:43:03 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Johan Hovold <johan@...nel.org>
Cc: mhi@...ts.linux.dev, linux-arm-msm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Loic Poulain <loic.poulain@...aro.org>
Subject: Re: mhi resume failure on reboot with 6.13-rc2

On Mon, Dec 16, 2024 at 02:20:09PM +0100, Johan Hovold wrote:
> On Mon, Dec 16, 2024 at 01:10:21PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Dec 11, 2024 at 04:03:59PM +0100, Johan Hovold wrote:
> > > On Wed, Dec 11, 2024 at 08:23:15PM +0530, Manivannan Sadhasivam wrote:
> > > > On Wed, Dec 11, 2024 at 03:17:22PM +0100, Johan Hovold wrote:
> > > 
> > > > > I just hit the following modem related error on reboot of the x1e80100
> > > > > CRD for the second time with 6.13-rc2:
> > > > > 
> > > > > 	[  138.348724] shutdown[1]: Rebooting.
> > > > >         [  138.545683] arm-smmu 3da0000.iommu: disabling translation
> > > > >         [  138.582505] mhi mhi0: Resuming from non M3 state (SYS ERROR)
> > > > >         [  138.588516] mhi-pci-generic 0005:01:00.0: failed to resume device: -22
> > > > >         [  138.595375] mhi-pci-generic 0005:01:00.0: device recovery started
> > > > >         [  138.603841] wwan wwan0: port wwan0qcdm0 disconnected
> > > > >         [  138.609508] wwan wwan0: port wwan0mbim0 disconnected
> > > > >         [  138.615137] wwan wwan0: port wwan0qmi0 disconnected
> > > > >         [  138.702604] mhi mhi0: Requested to power ON
> > > > >         [  139.027494] mhi mhi0: Power on setup success
> > > > >         [  139.027640] mhi mhi0: Wait for device to enter SBL or Mission mode
> > > > > 
> > > > > and then the machine hangs.
> 
> > Could be. But the issue seems to be stemming from the modem crash while exiting
> > M3. You can try removing the modem autosuspend by skipping the if condition
> > block:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/host/pci_generic.c?h=v6.13-rc1#n1184
> > 
> > If you no longer see the crash, then the issue might be with modem not coping
> > up with autosuspend. If you still see the crash, then something else going wrong
> > during reboot/power off.
> 
> I've only hit this issue three times and only since 6.13-rc2. So not
> sure how useful that sort of experiment would be.
> 

I do not have access to the device. So if you cannot spend time on debugging the
reason for crash, then I'll have to rely on Qcom to do it (which I've asked
anyway).

> > > Is there anything you can do on the mhi side to prevent it from blocking
> > > reboot/power off?
> > 
> > It should not block the reboot/power off forever. There is a timeout waiting for
> > SBL/Mission mode and the max time is 24s (depending on the modem). Can you share
> > the modem VID:PID?
> 
> I just hit the issue again and can confirm that it does block
> reboot/shutdown forever (I've been waiting for 20 minutes now).
> 

Ah, that's bad.

> Judging from a quick look at the code, "Wait for device to enter SBL or
> Mission mode" is printed by mhi_fw_load_handler(), which in turn is only
> called from the mhi_pm_st_worker() state machine.
> 
> I can't seem to find anything that makes sure that the next state is
> ever reached, so regardless of the cause of the modem fw crash

This code will make sure:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/host/pm.c?h=v6.13-rc1#n1264

But then it doesn't print the error and returns -ETIMEDOUT to the caller after
powering down MHI. The caller (mhi_pci_recovery_work), in the case of failure,
unprepares MHI and starts function level recovery.

> (if
> that's what it is) the hung reboot appears to be a bug in mhi.
> 

I'm not sure where exactly it got stuck. I've asked Qcom folks to reproduce this
issue. We will investigate and hopefully get back with a fix asap.

> This is with the SDX65 modem in the x1e80100 CRD:
> 	
> 	17cb:0308

Okay thanks!

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ