linux-kernel - Bug report: kernel paniced while booting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <tencent_7C3B580B47C1B17C16488EC1@qq.com>
Date:   Mon, 5 Jun 2023 10:52:14 +0000
From:   "Song Shuai" <songshuaishuai@...ylab.org>
To:     "alexghiti" <alexghiti@...osinc.com>,
        "robh" <robh@...nel.org>,
        "ajones" <ajones@...tanamicro.com>,
        "anup" <anup@...infault.org>,
        "palmer" <palmer@...osinc.com>,
        "jeeheng.sia" <jeeheng.sia@...rfivetech.com>,
        "leyfoon.tan" <leyfoon.tan@...rfivetech.com>,
        "mason.huo" <mason.huo@...rfivetech.com>,
        "paul.walmsley" <paul.walmsley@...ive.com>,
        "conor.dooley" <conor.dooley@...rochip.com>,
        "guoren" <guoren@...nel.org>
Cc:     "linux-riscv" <linux-riscv@...ts.infradead.org>,
        "linux-kernel" <linux-kernel@...r.kernel.org>
Subject: Bug report: kernel paniced while booting  

Description of problem:

Booting Linux With RiscVVirtQemu edk2 firmware, a Store/AMO page fault was trapped to trigger a kernel panic.
The entire log has been posted at this link : https://termbin.com/nga4.

You can reproduce it with the following step :

1. prepare the environment with 
   - Qemu-virt:  v8.0.0 (with OpenSbi v1.2)
   - edk2 : at commit (2bc8545883 "UefiCpuPkg/CpuPageTableLib: Reduce the number of random tests")
   - Linux : v6.4-rc1 and later version 

2. start the Qemu virt board  

```sh
$ cat ~/8_riscv/start_latest.sh
#!/bin/bash
/home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
        -s -nographic -drive file=/home/song/8_riscv/3_acpi/Build_virt/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT.fd,if=pflash,format=raw,unit=1 \                                                                    
        -machine virt,acpi=off -smp 2 -m 2G \
        -kernel /home/song/9_linux/linux/00_rv_def/arch/riscv/boot/Image \
        -initrd /home/song/8_riscv/3_acpi/buildroot/output/images/rootfs.ext2 \
        -append "root=/dev/ram ro console=ttyS0 earlycon=uart8250,mmio,0x10000000 efi=debug loglevel=8 memblock=debug" ## also panic by memtest
```
3. Then you will encounter the kernel panic logged in the above link

Other Information:

1. -------

This report is not identical to my prior report -- "kernel paniced when system hibernates" [1], but both of them 
are closely related with the commit (3335068f8721 "riscv: Use PUD/P4D/PGD pages for the linear mapping").

With this commit, hibernation is trapped with "access fault" while accessing the PMP-protected regions (mmode_resv0@...00000) 
from OpenSbi (BTW, hibernation is marked as nonportable by Conor[2]).

In this report, efi_init handoffs the memory mapping from Boot Services to memblock where reserves mmode_resv0@...00000, 
so there is no "access fault" but "page fault".

And reverting commit 3335068f8721 indeed fixed this panic.

2. -------

As the gdb-pt-dump [3] tool shows, the PTE which covered the fault virtual address had the appropriate permission to store. 
Is there another way to trigger the "Store/AMO page fault"? Or the creation of linear mapping in commit 3335068f8721 did something wrong?

```
(gdb) p/x $satp
$1 = 0xa000000000081708
(gdb) pt -satp 0xa000000000081708
             Address :     Length   Permissions    
  0xff1bfffffea39000 :     0x1000 | W:1 X:0 R:1 S:1
  0xff1bfffffebf9000 :     0x1000 | W:1 X:0 R:1 S:1
  0xff1bfffffec00000 :   0x400000 | W:1 X:0 R:1 S:1
  0xff60000000000000 :   0x1c0000 | W:1 X:0 R:1 S:1
  0xff60000000200000 :   0xa00000 | W:0 X:0 R:1 S:1
  0xff60000000c00000 : 0x7f000000 | W:1 X:0 R:1 S:1  // badaddr: ff6000007fdb1000
  0xff6000007fdc0000 :    0x3d000 | W:1 X:0 R:1 S:1
  0xff6000007ffbf000 :     0x1000 | W:1 X:0 R:1 S:1
  0xffffffff80000000 :   0xc00000 | W:0 X:1 R:1 S:1
  0xffffffff80c00000 :   0xa00000 | W:1 X:0 R:1 S:1

```

3. ------

You can also reproduce similar panic by appending "memtest" in kernel cmdline.
I have posted the memtest boot log at this link: https://termbin.com/1twl.

Please correct me if I'm wrong.

[1]: https://lore.kernel.org/linux-riscv/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com/
[2]: https://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/
[3]: https://github.com/martinradev/gdb-pt-dump