linux-kernel - Re: riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51339689-b92f-52fb-9202-2b91733f9180@sholland.org>
Date:   Sat, 20 May 2023 22:22:36 -0500
From:   Samuel Holland <samuel@...lland.org>
To:     Drew Fustini <dfustini@...libre.com>
Cc:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Conor Dooley <conor@...nel.org>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Andrew Jones <ajones@...tanamicro.com>,
        Anup Patel <anup@...infault.org>,
        Alexandre Ghiti <alexghiti@...osinc.com>
Subject: Re: riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD
 pages for the linear mapping")

Hi Drew,

On 5/20/23 21:05, Drew Fustini wrote:
> Hello, I tested 6.4-rc1 on an internal RISC-V SoC and observed a boot
> failure on a Store/AMO access fault (exception code 7) in __memset().
> stval (e.g. badaddr) was set to 0xffffaf8000000000. This SoC is RV64GC
> with Sv48 so it seems that address is the start of the "direct mapping
> of all physical memory" [1].
> 
> The 6.3 release boots okay and the system is able to operate correctly
> with an Ubuntu 23.04 rootfs on eMMC. Therefore, I decided to bisect and
> I found the failure begins with 3335068f8721 ("riscv: Use PUD/P4D/PGD
> pages for the linear mapping"). The system boots okay with the prior
> commit 8589e346bbb6 ("riscv: Move the linear mapping creation in its
> own function").
> 
> The boot log [2] shows that the fault happens right after buildroot's
> init script [3] uses switch_root to execute init from the Ubuntu rootfs
> on the eMMC.
> 
> DWARF4 is enabled in .config [4] and the decoded stack trace [5] shows:
> 
>   epc : __memset (/eng/dfustini/gitlab/linux/arch/riscv/lib/memset.S:67)
> 
> From memset.S:
> 
>  Line 67:         REG_S a1,        0(t0)
> 
> From the oops:
> 
>  epc : ffffffff81122d6c ra : ffffffff80218504 sp : ffffaf8002e47500
>   gp : ffffffff82695010 tp : ffffaf8002e2ec00 t0 : ffffaf8000000000
>   t1 : 0000000000000080 t2 : 0000000000000001 s0 : ffffaf8002e47550
>   s1 : ffff8d8200000040 a0 : ffffaf8000000000 a1 : 0000000000000000
> 
> Thus I think it is trying to store 0x0 to 0xffffaf8000000000 which is
> the start of the direct map. From the boot log [2], OpenSBI shows:
> 
>  Domain0 Region00 : 0x0000000002080000-0x00000000020bffff M: (I,R,W) S/U: ()
>  Domain0 Region01 : 0x0000008000000000-0x000000800003ffff M: (R,W,X) S/U: ()
>  Domain0 Region02 : 0x0000000002000000-0x000000000207ffff M: (I,R,W) S/U: ()
>  Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X)
> 
> The DDR memory on this SoC starts at 0x8000000000 with size 2GB. The
> memory node from the device tree [6]:
> 
>         memory@...0000000 {
>                 device_type = "memory";
>                 reg = <0x80 0 0x00000000 0x80000000>;
>         };
> 
> I think the direct map address 0xffffaf8000000000 would map to physical
> address 0x8000000000. Thus I think the attempted store in S-mode to that
> address would violate the PMP settings for Region01.
> 
> I do not yet understand why this happens with 3335068f8721 ("riscv: Use
> PUD/P4D/PGD pages for the linear mapping") but not for the prior commit
> 8589e346bbb6 ("riscv: Move the linear mapping creation in its own
> function").

Where does Linux's DTB come from? It should be the one that was modified
by OpenSBI to add a reserved-memory node matching PMP Region01
(fdt_reserved_memory_fixup()).

Before this commit, Linux ignored the first 2 MiB of physical RAM. So if
OpenSBI was loaded in this region, you could get away with ignoring the
firmware-provided DTB; now you actually need to use it, as intended.

Regards,
Samuel