lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029032422.GA7297@nxa18884-linux.ap.freescale.net>
Date: Wed, 29 Oct 2025 11:24:22 +0800
From: Peng Fan <peng.fan@....nxp.com>
To: Tanmay Shah <tanmay.shah@....com>
Cc: andersson@...nel.org, mathieu.poirier@...aro.org,
	linux-remoteproc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] remoteproc: xlnx: remote crash recovery

Hi Tanmay,

On Mon, Oct 27, 2025 at 09:57:28PM -0700, Tanmay Shah wrote:
>Remote processor can crash or hang during normal execution. Linux
>remoteproc framework supports different mechanisms to recover the
>remote processor and re-establish the RPMsg communication in such case.
>
>Crash reporting:
>
>1) Using debugfs node
>
>User can report the crash to the core framework via debugfs node using
>following command:
>
>echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash
>
>2) Remoteproc notify to the host about crash state and crash reason
>via the resource table
>
>This is a platform specific method where the remote firmware contains
>vendor specific resource to update the crash state and the crash
>reason. Then the remote notifies the crash to the host via mailbox
>notification. The host then will check this resource on every mbox
>notification and reports the crash to the core framework if needed.
>
>Crash recovery mechanism:
>
>There are two mechanisms available to recover the remote processor from
>the crash. 1) boot recovery, 2) attach on recovery
>
>Remoteproc core framework will choose proper mechanism based on the
>rproc features set by the platform driver.
>
>1) Boot recovery
>
>This is the default mechanism to recover the remote processor.
>In this method core framework will first stop the remote processor,
>load the firmware again and then starts the remote processor. On
>AMD-Xilinx platforms this method is supported. The coredump callback in
>the platform driver isn't implemented so far, but that shouldn't cause
>the recovery failure.
>
>2) Attach on recovery
>
>If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver,
>then the core framework will choose this method for recovery.
>
>On zynqmp platform following is the sequence of events expected during
>remoteproc crash and attach on recovery:
>
>a) rproc attach/detach flow is working, and RPMsg comm is established
>b) Remote processor (RPU) crashed (crash not reported yet)
>c) Platform management controller stops and reloads elf on inactive
>   remote processor before reboot
>d) platform management controller reboots the remote processor
>e) Remote processor boots again, and detects previous crash (platform
>   specific mechanism to detect the crash)
>f) Remote processor Reports crash to the Linux (Host) and wait for
>   the recovery.
>g) Linux performs full detach and reattach to remote processor.
>h) Normal RPMsg communication is established.
>
>It is required to destroy all RPMsg related resource and re-create them
>during recovery to establish successful RPMsg communication. To achieve
>this complete rproc_detach followed by rproc_attach calls are needed.
>
>
>Tanmay Shah (3):
>  remoteproc: xlnx: enable boot recovery
>  remoteproc: core: full attach detach during recovery
>  remoteproc: xlnx: add crash detection mechanism
>

I gave a test on i.MX8QM-MEK, there are failures, 1st test pass, 2nd test fail.
Without this patch, I not see failures.
root@...8qmmek:~#
remoteproc remoteproc0: crash detected in imx-rproc: type watchdog
Partition3 reset!
remoteproc remoteproc0: handling crash #1 in imx-rproc
remoteproc remoteproc0: detached remote processor imx-rproc
rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio0: rpmsg host is online
rproc-virtio rproc-virtio.1.auto: registered virtio0 (type 7)
rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio1: rpmsg host is online
rproc-virtio rproc-virtio.2.auto: registered virtio1 (type 7)
remoteproc remoteproc0: remote processor imx-rproc is now attached
virtio_rpmsg_bus virtio1: creating channel rpmsg-openamp-demo-channel addr 0x1e

remoteproc remoteproc0: crash detected in imx-rproc: type watchdog
Partition3 reset!
remoteproc remoteproc0: handling crash #2 in imx-rproc
rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio4: probe with driver virtio_rpmsg_bus failed with error -12
rproc-virtio rproc-virtio.1.auto: registered virtio4 (type 7)
rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio5: probe with driver virtio_rpmsg_bus failed with error -12
rproc-virtio rproc-virtio.2.auto: registered virtio5 (type 7)
rproc-virtio rproc-virtio.5.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio6: probe with driver virtio_rpmsg_bus failed with error -12
rproc-virtio rproc-virtio.5.auto: registered virtio6 (type 7)
rproc-virtio rproc-virtio.6.auto: assigned reserved memory node vdevbuffer@...00000
virtio_rpmsg_bus virtio7: probe with driver virtio_rpmsg_bus failed with error -12
rproc-virtio rproc-virtio.6.auto: registered virtio7 (type 7)
remoteproc remoteproc0: remote processor imx-rproc is now attached

Thanks,
Peng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ