lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHoMtJdJgXCs45Oi=BUFWVcw76J5Kk-6_1ZuXVvZM_vpA@mail.gmail.com>
Date:   Thu, 10 Mar 2022 15:31:04 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Liam Howlett <liam.howlett@...cle.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "shy828301@...il.com" <shy828301@...il.com>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "guro@...com" <guro@...com>, "riel@...riel.com" <riel@...riel.com>,
        "minchan@...nel.org" <minchan@...nel.org>,
        "kirill@...temov.name" <kirill@...temov.name>,
        "aarcange@...hat.com" <aarcange@...hat.com>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "christian@...uner.io" <christian@...uner.io>,
        "hch@...radead.org" <hch@...radead.org>,
        "oleg@...hat.com" <oleg@...hat.com>,
        "david@...hat.com" <david@...hat.com>,
        "jannh@...gle.com" <jannh@...gle.com>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "christian.brauner@...ntu.com" <christian.brauner@...ntu.com>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "jengelh@...i.de" <jengelh@...i.de>,
        "timmurray@...gle.com" <timmurray@...gle.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kernel-team@...roid.com" <kernel-team@...roid.com>,
        "syzbot+2ccf63a4bd07cf39cab0@...kaller.appspotmail.com" 
        <syzbot+2ccf63a4bd07cf39cab0@...kaller.appspotmail.com>
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused
 after being freed

On Thu, Mar 10, 2022 at 2:22 PM Liam Howlett <liam.howlett@...cle.com> wrote:
>
> * Suren Baghdasaryan <surenb@...gle.com> [220310 11:28]:
> > On Thu, Mar 10, 2022 at 7:55 AM Liam Howlett <liam.howlett@...cle.com> wrote:
> > >
> > > * Suren Baghdasaryan <surenb@...gle.com> [220225 00:51]:
> > > > On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <willy@...radead.org> wrote:
> > > > >
> > > > > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > > > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <surenb@...gle.com> wrote:
> > > > > >
> > > > > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > > > > otherwise it points to a vma that was freed and when reused leads to
> > > > > > > a use-after-free bug.
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > --- a/mm/mmap.c
> > > > > > > +++ b/mm/mmap.c
> > > > > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > > > >             vma = remove_vma(vma);
> > > > > > >             cond_resched();
> > > > > > >     }
> > > > > > > +   mm->mmap = NULL;
> > > > > > >     mmap_write_unlock(mm);
> > > > > > >     vm_unacct_memory(nr_accounted);
> > > > > > >  }
> > > > > >
> > > > > > After the Maple tree patches, mm_struct.mmap doesn't exist.  So I'll
> > > > > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > > > > I'll be sending this fix to Linus this week.
> > > > > >
> > > > > > All of which means that the thusly-resolved Maple tree patches might
> > > > > > reintroduce this use-after-free bug.
> > > > >
> > > > > I don't think so?  The problem is that VMAs are (currently) part of
> > > > > two data structures -- the rbtree and the linked list.  remove_vma()
> > > > > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> > > > >
> > > > > With maple tree, the linked list goes away.  remove_vma() removes VMAs
> > > > > from the maple tree.  So anyone looking to iterate over all VMAs has to
> > > > > go and look in the maple tree for them ... and there's nothing there.
> > > >
> > > > Yes, I think you are right. With maple trees we don't need this fix.
> > >
> > >
> > > Yes, this is correct.  The maple tree removes the entire linked list...
> > > but since the mm is unstable in the exit_mmap(), I had added the
> > > destruction of the maple tree there.  Maybe this is the wrong place to
> > > be destroying the tree tracking the VMAs (althought this patch partially
> > > destroys the VMA tracking linked list), but it brought my attention to
> > > the race that this patch solves and the process_mrelease() function.
> > > Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
> > > in process_mrelease()?
> >
> > That's what we were doing before [1]. That unfortunately has a problem
> > of process_mrelease possibly calling the last mmput and being blocked
> > on IO completion in exit_aio.
>
> Oh, I see. Thanks.
>
>
> > The race between exit_mmap and
> > process_mrelease is solved by using mmap_lock.
>
> I think an important part of the race fix isn't just the lock holding
> but the setting of the start of the linked list to NULL above.  That
> means the code in __oom_reap_task_mm() via process_mrelease() will
> continue to execute but iterate for zero VMAs.
>
> > I think by destroying the maple tree in exit_mmap before the
> > mmap_write_unlock call, you keep things working and functionality
> > intact. Is there any reason this can't be done?
>
> Yes, unfortunately.  If MMF_OOM_SKIP is not set, then process_mrelease()
> will call __oom_reap_task_mm() which will get a null pointer dereference
> or a use after free in the vma iterator as it tries to iterate the maple
> tree.  I think the best plan is to set MMF_OOM_SKIP unconditionally
> when the mmap_write_lock() is acquired.  Doing so will ensure nothing
> will try to gain memory by reaping a task that no longer has memory to
> yield - or at least won't shortly.  If we do use MMF_OOM_SKIP in such a
> way, then I think it is safe to quickly drop the lock?

That technically would work but it changes the semantics of
MMF_OOM_SKIP flag from "mm is of no interest for the OOM killer" to
something like "mm is empty" akin to mm->mmap == NULL.
So, there is no way for maple tree to indicate that it is empty?

>
> Also, should process_mrelease() be setting MMF_OOM_VICTIM on this mm?
> It would enable the fast path on a race with exit_mmap() - thought that
> may not be desirable?

Michal does not like that approach because again, process_mrelease is
not oom-killer to set MMF_OOM_VICTIM flag. Besides, we want to get rid
of that special mm_is_oom_victim(mm) branch inside exit_mmap. Which
reminds me to look into it again.

>
> >
> > [1] ba535c1caf3ee78a ("mm/oom_kill: allow process_mrelease to run
> > under mmap_lock protection")
> >
> > > That would ensure we aren't stepping on an
> > > exit_mmap() and potentially the locking change in exit_mmap() wouldn't
> > > be needed either?  Logically, I view this as process_mrelease() having
> > > issue with the fact that the mmaps are no longer stable in tear down
> > > regardless of the data structure that is used.
> > >
> > > Thanks,
> > > Liam
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@...roid.com.
> > >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@...roid.com.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ