lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Oct 2021 08:32:05 +0200
From:   Andrea Righi <andrea.righi@...onical.com>
To:     Marco Elver <elver@...gle.com>
Cc:     Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>, kasan-dev@...glegroups.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled

On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@...onical.com> wrote:
> > I can systematically reproduce the following soft lockup w/ the latest
> > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > tested so far).
> >
> > I've found this issue by running systemd autopkgtest (I'm using the
> > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > any recent version of systemd).
> >
> > I'm running this test inside a local KVM instance and apparently systemd
> > is starting up its own KVM instances to run its tests, so the context is
> > a nested KVM scenario (even if I don't think the nested KVM part really
> > matters).
> >
> > Here's the oops:
> >
> > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [...]
> >
> > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > autotest completes just fine.
> >
> > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > (5.13) for now, because of this issue, but I'm still investigating
> > trying to better understand the problem.
> >
> > Any hint / suggestion?
> 
> Can you confirm this is not a QEMU TCG instance? There's been a known
> issue with it: https://bugs.launchpad.net/qemu/+bug/1920934

It looks like systemd is running qemu-system-x86 without any "accel"
options, so IIUC the instance shouldn't use TCG. Is this a correct
assumption or is there a better way to check?

> 
> One thing that I've been wondering is, if we can make
> CONFIG_KFENCE_STATIC_KEYS=n the default, because the static keys
> approach is becoming more trouble than it's worth. It requires us to
> re-benchmark the defaults. If you're thinking of turning KFENCE on by
> default (i.e. CONFIG_KFENCE_SAMPLE_INTERVAL non-zero), you could make
> this decision for Ubuntu with whatever sample interval you choose.
> We've found that for large deployments 500ms or above is more than
> adequate.

Another thing that I forgot to mention is that with
CONFIG_KFENCE_STATIC_KEYS=n the soft lockup doesn't seem to happen.

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ