lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <72563756-a53a-4f50-9bf4-87f6b26af036@linux.alibaba.com>
Date: Fri, 12 Sep 2025 15:30:41 +0800
From: Ruidong Tian <tianruidong@...ux.alibaba.com>
To: Himanshu Chauhan <hchauhan@...tanamicro.com>,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
 acpica-devel@...ts.linux.dev
Cc: paul.walmsley@...ive.com, palmer@...belt.com, lenb@...nel.org,
 james.morse@....com, tony.luck@...el.com, ardb@...nel.org, conor@...nel.org,
 cleger@...osinc.com, robert.moore@...el.com, sunilvl@...tanamicro.com,
 apatel@...tanamicro.com, xueshuai@...ux.alibaba.com
Subject: Re: [RFC PATCH v1 00/10] Add RAS support for RISC-V architecture


在 2025/2/27 20:36, Himanshu Chauhan 写道:
> This series implements the RAS (Reliability, Availability and Serviceability)
> support for RISC-V architecture using RISC-V RERI specification. It is conformant
> to ACPI platform error interfaces (APEI). It uses the highest priority
> Supervisor Software Events (SSE)[2] to deliver the hardware error events to the kernel.
> The SSE implemetation has already been merged in OpenSBI. Clement has sent a patch series for
> its implemenation in Linux kernel.[5]
>
> The GHES driver framework is used as is with the following changes for RISC-V:
> 	1. Register each ghes entry with SSE layer. Ghes notification vector is SSE event.
> 	2. Add RISC-V specific entries for processor type and ISA string
> 	3. Add fixmap indices GHES SSE Low and High Priority to help map and read from
> 	   physical addresses present in GHES entry.
> 	4. Other changes to build/configure the RAS support
>
> How to Use:
> ----------
> This RAS stack consists of Qemu[3], OpenSBI, EDK2[4], Linux kernel and devmem utility to inject and trigger
> errors. Qemu [Ref.] has support to emulate RISC-V RERI. The RAS agent is implemented in OpenSBI which
> creates CPER records. EDK2 generates HEST table and populates it with GHES entries with the help of
> OpenSBI.
>
> Qemu Command:
> ------------
> <qemu-dir>/build/qemu-system-riscv64 \
>      -s -accel tcg -m 4096 -smp 2 \
>      -cpu rv64,smepmp=false \
>      -serial mon:stdio \
>      -d guest_errors -D ./qemu.log \
>      -bios <opensbi-dir>/build/platform/generic/firmware/fw_dynamic.bin \
>      -monitor telnet:127.0.0.1:55555,server,nowait \
>      -device virtio-gpu-pci -full-screen \
>      -device qemu-xhci \
>      -device usb-kbd \
>      -blockdev node-name=pflash0,driver=file,read-only=on,filename=<edk2-build-dir>/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT_CODE.fd \
>      -blockdev node-name=pflash1,driver=file,filename=<edk2-build-dir>/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT_VARS.fd \
>      -M virt,pflash0=pflash0,pflash1=pflash1,rpmi=true,reri=true,aia=aplic-imsic \
>      -kernel <kernel image> \
>      -initrd <rootfs image> \
>      -append "root=/dev/ram rw console=ttyS0 earlycon=uart8250,mmio,0x10000000"
>
> Error Injection & Triggering:
> ----------------------------
> devmem 0x4010040 32 0x2a1
> devmem 0x4010048 32 0x9001404
> devmem 0x4010044 8 1
>
> The above commands injects a TLB error on CPU 0.
>
> Sample Output (CPU 0):
> ---------------------
> [   34.370282] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [   34.371375] {1}[Hardware Error]: event severity: recoverable
> [   34.372149] {1}[Hardware Error]:  Error 0, type: recoverable
> [   34.372756] {1}[Hardware Error]:   section_type: general processor error
> [   34.373357] {1}[Hardware Error]:   processor_type: 3, RISCV
> [   34.373806] {1}[Hardware Error]:   processor_isa: 6, RISCV64
> [   34.374294] {1}[Hardware Error]:   error_type: 0x02
> [   34.374845] {1}[Hardware Error]:   TLB error
> [   34.375448] {1}[Hardware Error]:   operation: 1, data read
> [   34.376100] {1}[Hardware Error]:   target_address: 0x0000000000000000
>
> References:
> ----------
> [1] RERI Specification: https://github.com/riscv-non-isa/riscv-ras-eri/releases/download/v1.0/riscv-reri.pdf
> [2] SSE Section in OpenSBI v3.0: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v3.0-rc3/riscv-sbi.pdf
> [3] Qemu source (with RERI emulation support): https://github.com/ventanamicro/qemu.git (branch: dev-upstream)
> [4] EDK2: https://github.com/ventanamicro/edk2.git (branch: dev-upstream)
> [5] SSE Kernel Patches: https://lore.kernel.org/linux-riscv/649fdead-09b0-4f94-a6ff-099fc970d890@rivosinc.com/T/

Hi,

Thanks for this series.

I'm doing some work related to your patch. Besides SSE, I'm working on support
for another notification type for synchronous hardware errors (e.g., on a poison
read), which called Hardware Error Exception (HEE) in Dhaval Sharma's UEFI
proposal[0] in PRS-TG.  I have a patch for HEE support which I've sent out
separately[1].

Perhaps we could merge my work into your patchset to bringing a complete RAS
solution to the RISC-V architecture? Or, I'm also happy to wait for your patches
to land and then continue my work on top.

Let me know what you think would be best.

Cheers,
Ruidong Tian

[0]: https://lists.riscv.org/g/tech-prs/topic/risc_v_ras_related_ecrs/113685653
[1]: https://lore.kernel.org/all/20250910093347.75822-6-tianruidong@linux.alibaba.com/

> Himanshu Chauhan (10):
>    riscv: Define ioremap_cache for RISC-V
>    riscv: Define arch_apei_get_mem_attribute for RISC-V
>    acpi: Introduce SSE in HEST notification types
>    riscv: Add fixmap indices for GHES IRQ and SSE contexts
>    riscv: conditionally compile GHES NMI spool function
>    riscv: Add functions to register ghes having SSE notification
>    riscv: Add RISC-V entries in processor type and ISA strings
>    riscv: Introduce HEST SSE notification handlers
>    riscv: Add config option to enable APEI SSE handler
>    riscv: Enable APEI and NMI safe cmpxchg options required for RAS
>
>   arch/riscv/Kconfig                 |   2 +
>   arch/riscv/include/asm/acpi.h      |  20 ++++
>   arch/riscv/include/asm/fixmap.h    |   8 ++
>   arch/riscv/include/asm/io.h        |   3 +
>   drivers/acpi/apei/Kconfig          |   5 +
>   drivers/acpi/apei/ghes.c           | 102 +++++++++++++++++---
>   drivers/firmware/efi/cper.c        |   3 +
>   drivers/firmware/riscv/riscv_sse.c | 147 +++++++++++++++++++++++++++++
>   include/acpi/actbl1.h              |   3 +-
>   include/linux/riscv_sse.h          |  15 +++
>   10 files changed, 296 insertions(+), 12 deletions(-)
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ