linux-kernel - Re: WARNING in __do_kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACT4Y+auMTWM9NMHAfyBjuSc6o7+7VkCxBgp6AodHk8XUu4VWA@mail.gmail.com>
Date:   Fri, 12 Mar 2021 11:56:40 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Will Deacon <will@...nel.org>
Cc:     syzbot <syzbot+45b6fce29ff97069e2c5@...kaller.appspotmail.com>,
        Dave Martin <Dave.Martin@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Rutland <mark.rutland@....com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Andrey Konovalov <andreyknvl@...gle.com>
Subject: Re: WARNING in __do_kernel_fault

On Wed, Jan 27, 2021 at 6:34 PM Will Deacon <will@...nel.org> wrote:
>
> On Wed, Jan 27, 2021 at 06:24:22PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 27, 2021 at 6:15 PM Will Deacon <will@...nel.org> wrote:
> > >
> > > On Wed, Jan 27, 2021 at 06:00:30PM +0100, Dmitry Vyukov wrote:
> > > > On Wed, Jan 27, 2021 at 5:56 PM syzbot
> > > > <syzbot+45b6fce29ff97069e2c5@...kaller.appspotmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit:    2ab38c17 mailmap: remove the "repo-abbrev" comment
> > > > > git tree:       upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15a25264d00000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=ad43be24faf1194c
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=45b6fce29ff97069e2c5
> > > > > userspace arch: arm64
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+45b6fce29ff97069e2c5@...kaller.appspotmail.com
> > > >
> > > > This happens on arm64 instance with mte enabled.
> > > > There is a GPF in reiserfs_xattr_init on x86_64 reported:
> > > > https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde
> > > > so I would assume it's just a plain NULL deref. Is this WARNING not
> > > > indicative of a kernel bug? Or there is something special about this
> > > > particular NULL deref?
> > >
> > > Congratulations, you're the first person to trigger this warning!
> > >
> > > This fires if we take an unexpected data abort in the kernel but when we
> > > get into the fault handler the page-table looks ok (according to the CPU via
> > > an 'AT' instruction). Are you using QEMU system emulation? Perhaps its
> > > handling of AT isn't quite right.
> >
> > Yes, it's qemu-system-aarch64 5.2 with -machine virt,mte=on -cpu max.
> > Do you see any way forward for this issue? Can somehow prove/disprove
> > it's qemu at fault?
> > The instance just started running, but it seems to be the most common
> > crash so far and it seems to happen on _all_ gpf's.
> > You can see all arm64 crashes so far here:
> > https://syzkaller.appspot.com/upstream?manager=ci-qemu2-arm64-mte
> > They all happen in reiserfs_security_init, but locally I got a bunch
> > of different stacks, e.g.:
>
> Your best bet is to hack is_spurious_el1_translation_fault() to dump addr,
> es and par, then we can help decipher the logs here. It could also easily be
> a bug in that code, since it hasn't been run before (well, other than
> contrived testing when I wrote it).

Should dumping of addr/es/par be included into mainline kernel code if
this WARNING is not decipherable without this info?

Also, Andrey localized this to mte=on,virtualization=on combination,
does this point towards qemu bug?