lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMZdPi_0oiTFmgkq0hAhamVq-Noqa+jGDLZ_6yVaqHvcO+N=nA@mail.gmail.com>
Date: Mon, 16 Dec 2024 17:25:23 +0100
From: Loic Poulain <loic.poulain@...aro.org>
To: Johan Hovold <johan@...nel.org>, 
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc: mhi@...ts.linux.dev, linux-arm-msm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: mhi resume failure on reboot with 6.13-rc2

On Mon, 16 Dec 2024 at 15:13, Manivannan Sadhasivam
<manivannan.sadhasivam@...aro.org> wrote:
>
> On Mon, Dec 16, 2024 at 02:20:09PM +0100, Johan Hovold wrote:
> > On Mon, Dec 16, 2024 at 01:10:21PM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, Dec 11, 2024 at 04:03:59PM +0100, Johan Hovold wrote:
> > > > On Wed, Dec 11, 2024 at 08:23:15PM +0530, Manivannan Sadhasivam wrote:
> > > > > On Wed, Dec 11, 2024 at 03:17:22PM +0100, Johan Hovold wrote:
> > > >
> > > > > > I just hit the following modem related error on reboot of the x1e80100
> > > > > > CRD for the second time with 6.13-rc2:
> > > > > >
> > > > > >       [  138.348724] shutdown[1]: Rebooting.
> > > > > >         [  138.545683] arm-smmu 3da0000.iommu: disabling translation
> > > > > >         [  138.582505] mhi mhi0: Resuming from non M3 state (SYS ERROR)
> > > > > >         [  138.588516] mhi-pci-generic 0005:01:00.0: failed to resume device: -22
> > > > > >         [  138.595375] mhi-pci-generic 0005:01:00.0: device recovery started
> > > > > >         [  138.603841] wwan wwan0: port wwan0qcdm0 disconnected
> > > > > >         [  138.609508] wwan wwan0: port wwan0mbim0 disconnected
> > > > > >         [  138.615137] wwan wwan0: port wwan0qmi0 disconnected
> > > > > >         [  138.702604] mhi mhi0: Requested to power ON
> > > > > >         [  139.027494] mhi mhi0: Power on setup success
> > > > > >         [  139.027640] mhi mhi0: Wait for device to enter SBL or Mission mode
> > > > > >
> > > > > > and then the machine hangs.
> >
> > > Could be. But the issue seems to be stemming from the modem crash while exiting
> > > M3. You can try removing the modem autosuspend by skipping the if condition
> > > block:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/host/pci_generic.c?h=v6.13-rc1#n1184
> > >
> > > If you no longer see the crash, then the issue might be with modem not coping
> > > up with autosuspend. If you still see the crash, then something else going wrong
> > > during reboot/power off.
> >
> > I've only hit this issue three times and only since 6.13-rc2. So not
> > sure how useful that sort of experiment would be.
> >
>
> I do not have access to the device. So if you cannot spend time on debugging the
> reason for crash, then I'll have to rely on Qcom to do it (which I've asked
> anyway).
>
> > > > Is there anything you can do on the mhi side to prevent it from blocking
> > > > reboot/power off?
> > >
> > > It should not block the reboot/power off forever. There is a timeout waiting for
> > > SBL/Mission mode and the max time is 24s (depending on the modem). Can you share
> > > the modem VID:PID?
> >
> > I just hit the issue again and can confirm that it does block
> > reboot/shutdown forever (I've been waiting for 20 minutes now).
> >
>
> Ah, that's bad.
>
> > Judging from a quick look at the code, "Wait for device to enter SBL or
> > Mission mode" is printed by mhi_fw_load_handler(), which in turn is only
> > called from the mhi_pm_st_worker() state machine.
> >
> > I can't seem to find anything that makes sure that the next state is
> > ever reached, so regardless of the cause of the modem fw crash
>
> This code will make sure:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/host/pm.c?h=v6.13-rc1#n1264
>
> But then it doesn't print the error and returns -ETIMEDOUT to the caller after
> powering down MHI. The caller (mhi_pci_recovery_work), in the case of failure,
> unprepares MHI and starts function level recovery.
>
> > (if
> > that's what it is) the hung reboot appears to be a bug in mhi.
> >
>
> I'm not sure where exactly it got stuck. I've asked Qcom folks to reproduce this
> issue. We will investigate and hopefully get back with a fix asap.
>
> > This is with the SDX65 modem in the x1e80100 CRD:
> >
> >       17cb:0308

I have another MHI modem model, but will try to reproduce during the
week, any idea on the bug rate?

Regards,
Loic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ