lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Aug 2023 19:12:27 +0200
From:   Jann Horn <jannh@...gle.com>
To:     "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Jann Horn <jannh@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: maple tree change made it possible for VMA iteration to see same
 VMA twice due to late vma_merge() failure

On Wed, Aug 16, 2023 at 6:18 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> * Jann Horn <jannh@...gle.com> [230815 15:37]:
> > commit 18b098af2890 ("vma_merge: set vma iterator to correct
> > position.") added a vma_prev(vmi) call to vma_merge() at a point where
> > it's still possible to bail out. My understanding is that this moves
> > the VMA iterator back by one VMA.
> >
> > If you patch some extra logging into the kernel and inject a fake
> > out-of-memory error at the vma_iter_prealloc() call in vma_split() (a
> > real out-of-memory error there is very unlikely to happen in practice,
> > I think - my understanding is that the kernel will basically kill
> > every process on the system except for init before it starts failing
> > GFP_KERNEL allocations that fit within a single slab, unless the
> > allocation uses GFP_ACCOUNT or stuff like that, which the maple tree
> > doesn't):
[...]
> > then you'll get this fun log output, showing that the same VMA
> > (ffff88810c0b5e00) was visited by two iterations of the VMA iteration
> > loop, and on the second iteration, prev==vma:
> >
> > [  326.765586] userfaultfd_register: begin vma iteration
> > [  326.766985] userfaultfd_register: prev=ffff88810c0b5ef0,
> > vma=ffff88810c0b5e00 (0000000000101000-0000000000102000)
> > [  326.768786] userfaultfd_register: vma_merge returned 0000000000000000
> > [  326.769898] userfaultfd_register: prev=ffff88810c0b5e00,
> > vma=ffff88810c0b5e00 (0000000000101000-0000000000102000)
> >
> > I don't know if this can lead to anything bad but it seems pretty
> > clearly unintended?
>
> Yes, unintended.
>
> So we are running out of memory, but since vma_merge() doesn't
> differentiate between failure and 'nothing to merge', we end up in a
> situation that we will revisit the same VMA.
>
> I've been thinking about a way to work this into the interface and I
> don't see a clean way because we (could) do different things before the
> call depending on the situation.
>
> I think we need to undo any vma iterator changes in the failure
> scenarios if there is a chance of the iterator continuing to be used,
> which is probably not limited to just this case.

I don't fully understand the maple tree interface - in the specific
case of vma_merge(), could you move the vma_prev() call down below the
point of no return, after vma_iter_prealloc()? Or does
vma_iter_prealloc() require that the iterator is already in the insert
position?

> I will audit these areas and CC you on the result.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ