lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+zEjCs1FEUTcM+pgV+_MZnixSO5c2hexZFxGxuCQWc2ZMQiRg@mail.gmail.com>
Date:   Thu, 17 Feb 2022 17:53:28 +0100
From:   Alexandre Ghiti <alexandre.ghiti@...onical.com>
To:     Aleksandr Nogikh <nogikh@...gle.com>
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        Alexandre Ghiti <alex@...ti.fr>,
        linux-riscv@...ts.infradead.org,
        kasan-dev <kasan-dev@...glegroups.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        syzbot <syzbot+330a558d94b58f7601be@...kaller.appspotmail.com>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] riscv/fixes boot error: can't ssh into the instance

Aleksandr,

On Wed, Feb 16, 2022 at 5:58 PM Alexandre Ghiti
<alexandre.ghiti@...onical.com> wrote:
>
> First, thank you for working on this.
>
> On Wed, Feb 16, 2022 at 5:17 PM Aleksandr Nogikh <nogikh@...gle.com> wrote:
> >
> > If I use just defconfig + DEBUG_VIRTUAL, without any KASAN, it begins
> > to boot, but overwhelms me with tons of `virt_to_phys used for
> > non-linear address:` errors.
> >
> > Like that
> >
> > [    2.701271] virt_to_phys used for non-linear address:
> > 00000000b59e31b6 (0xffffffff806c2000)
> > [    2.701727] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/physaddr.c:16
> > __virt_to_phys+0x7e/0x86
> > [    2.702207] Modules linked in:
> > [    2.702393] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> >   5.17.0-rc1 #1
> > [    2.702806] Hardware name: riscv-virtio,qemu (DT)
> > [    2.703051] epc : __virt_to_phys+0x7e/0x86
> > [    2.703298]  ra : __virt_to_phys+0x7e/0x86
> > [    2.703547] epc : ffffffff80008448 ra : ffffffff80008448 sp :
> > ffff8f800021bde0
> > [    2.703977]  gp : ffffffff80ed9b30 tp : ffffaf8001230000 t0 :
> > ffffffff80eea56f
> > [    2.704704]  t1 : ffffffff80eea560 t2 : 0000000000000000 s0 :
> > ffff8f800021be00
> > [    2.705153]  s1 : ffffffff806c2000 a0 : 000000000000004f a1 :
> > ffffffff80e723d8
> > [    2.705555]  a2 : 0000000000000010 a3 : fffffffffffffffe a4 :
> > 0000000000000000
> > [    2.706027]  a5 : 0000000000000000 a6 : 0000000000000005 a7 :
> > ffffffffffffffff
> > [    2.706474]  s2 : ffffffff80b80b08 s3 : 00000000000000c2 s4 :
> > ffffffff806c2000
> > [    2.706891]  s5 : ffffffff80edba10 s6 : ffffffff80edb960 s7 :
> > 0000000000000001
> > [    2.707290]  s8 : 00000000000000ff s9 : ffffffff80b80b40 s10:
> > 00000000000000cc
> > [    2.707689]  s11: ffffaf807e1fcf00 t3 : 0000000000000076 t4 :
> > ffffffffffffffff
> > [    2.708092]  t5 : 00000000000001f2 t6 : ffff8f800021bb48
> > [    2.708433] status: 0000000000000120 badaddr: 0000000000000000
> > cause: 0000000000000003
> > [    2.708919] [<ffffffff8011416a>] free_reserved_area+0x72/0x19a
> > [    2.709296] [<ffffffff80003a5a>] free_initmem+0x6c/0x7c
> > [    2.709648] [<ffffffff805f60c8>] kernel_init+0x3a/0x10a
> > [    2.709993] [<ffffffff80002fda>] ret_from_exception+0x0/0xc
> > [    2.710310] ---[ end trace 0000000000000000 ]---
> >
>
> I was able to reproduce this: the first one regarding init_zero_pfn is
> legit but not wrong, I have to check when it was introduced and how to
> fix this.
> Regarding the huge batch that follows, at first sight, I would say
> this is linked to my sv48 patchset but that does not seem important as
> the address is a kernel mapping address so the use of virt_to_phys is
> right.
>
> > On Wed, Feb 16, 2022 at 5:09 PM Aleksandr Nogikh <nogikh@...gle.com> wrote:
> > >
> > > On Wed, Feb 16, 2022 at 12:56 PM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > > >
> > > > On Wed, 16 Feb 2022 at 12:47, Aleksandr Nogikh <nogikh@...gle.com> wrote:
> > > > >
> > > > > On Wed, Feb 16, 2022 at 11:37 AM Aleksandr Nogikh <nogikh@...gle.com> wrote:
> > > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > On Wed, Feb 16, 2022 at 5:14 AM Alexandre Ghiti <alex@...ti.fr> wrote:
> > > > > > >
> > > > > > > Hi Dmitry,
> > > > > > >
> > > > > > > On 2/15/22 18:12, Dmitry Vyukov wrote:
> > > > > > > > On Wed, 2 Feb 2022 at 14:18, Alexandre Ghiti
> > > > > > > > <alexandre.ghiti@...onical.com> wrote:
> > > > > > > >> Hi Aleksandr,
> > > > > > > >>
> > > > > > > >> On Wed, Feb 2, 2022 at 12:08 PM Aleksandr Nogikh <nogikh@...gle.com> wrote:
> > > > > > > >>> Hello,
> > > > > > > >>>
> > > > > > > >>> syzbot has already not been able to fuzz its RISC-V instance for 97
> > > > > > > >> That's a longtime, I'll take a look more regularly.
> > > > > > > >>
> > > > > > > >>> days now because the compiled kernel cannot boot. I bisected the issue
> > > > > > > >>> to the following commit:
> > > > > > > >>>
> > > > > > > >>> commit 54c5639d8f507ebefa814f574cb6f763033a72a5
> > > > > > > >>> Author: Alexandre Ghiti <alexandre.ghiti@...onical.com>
> > > > > > > >>> Date:   Fri Oct 29 06:59:27 2021 +0200
> > > > > > > >>>
> > > > > > > >>>      riscv: Fix asan-stack clang build
> > > > > > > >>>
> > > > > > > >>> Apparently, the problem appears on GCC-built RISC-V kernels with KASAN
> > > > > > > >>> enabled. In the previous message syzbot mentions
> > > > > > > >>> "riscv64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU
> > > > > > > >>> Binutils for Debian) 2.35.2", but the issue also reproduces finely on
> > > > > > > >>> a newer GCC compiler: "riscv64-linux-gnu-gcc (Debian 11.2.0-10)
> > > > > > > >>> 11.2.0, GNU ld (GNU Binutils for Debian) 2.37".
> > > > > > > >>> For convenience, I also duplicate the .config file from the bot's
> > > > > > > >>> message: https://syzkaller.appspot.com/x/.config?x=522544a2e0ef2a7d
> > > > > > > >>>
> > > > > > > >>> Can someone with KASAN and RISC-V expertise please take a look?
> > > > > > > >> I'll take a look at that today.
> > > > > > > >>
> > > > > > > >> Thanks for reporting the issue,
> > > > > > > >
> > > > > > >
> > > > > > > I took a quick look, not enough to fix it but I know the issue comes
> > > > > > > from the inline instrumentation, I have no problem with the outline
> > > > > > > instrumentation. I need to find some cycles to work on this, my goal is
> > > > > > > to fix this for 5.17.
> > > > > >
> > > > > > Thanks for the update!
> > > > > >
> > > > > > Can you please share the .config with which you tested the outline
> > > > > > instrumentation?
> > > > > > I updated the syzbot config to use KASAN_OUTLINE instead of KASAN_INLINE,
> > > > > > but it still does not boot :(
> > > > > >
> > > > > > Here's what I used:
> > > > > > https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138
> > > > >
> > > > > Update: it doesn't boot with that big config, but boots if I generate
> > > > > a simple one with KASAN_OUTLINE:
> > > > >
> > > > > make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > > > > ./scripts/config -e KASAN -e KASAN_OUTLINE
> > > > > make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > > > >
> > > > > And it indeed doesn't work if I use KASAN_INLINE.
> > > >
> > > > It may be an issue with code size. Full syzbot config + KASAN + KCOV
> > > > produce hugely massive .text. It may be hitting some limitation in the
> > > > bootloader/kernel bootstrap code.
>
> I took a quick glance and it traps on a KASAN address that is not
> mapped, either because it is too soon or because the mapping failed
> somehow.
>
> I'll definitely dive into that tomorrow, sorry for being slow here and
> thanks again for all your work, that helps a lot.
>
> Thanks,
>
> Alex
>
> > >
> > > I bisected the difference between the config we use on syzbot and the
> > > simple one that was generated like I described above.
> > > Turns out that it's the DEBUG_VIRTUAL config that makes the difference.
> > >
> > > make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > > ./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL
> > > make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > >
> > > And the resulting kernel does not boot.
> > > My env: the `riscv/fixes` branch, commit
> > > 6df2a016c0c8a3d0933ef33dd192ea6606b115e3, qemu 6.2.0.

I fixed a few things today: KASAN + SPARSE_VMEMMAP, DEBUG_VIRTUAL and
maybe KASAN  + KCOV.

With those small fixes, I was able to boot your large dotconfig with
KASAN_OUTLINE, the inline version still fails, this is my next target
:)
I'll push that tomorrow!

Thanks again,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ