lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0ChA80cjg4-=1k8PfXV_4u_YVQay9g_RdrLjbAzch4Gw@mail.gmail.com>
Date:   Tue, 3 Nov 2020 03:11:02 +0100
From:   Jann Horn <jannh@...gle.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        Eric Biederman <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>,
        kernel list <linux-kernel@...r.kernel.org>,
        Will Deacon <will@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC PATCH resend 3/6] mm: Add refcount for preserving mm_struct
 without pgd

On Sat, Oct 17, 2020 at 2:30 AM Jann Horn <jannh@...gle.com> wrote:
> On Sat, Oct 17, 2020 at 1:21 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > On Sat, Oct 17, 2020 at 01:09:12AM +0200, Jann Horn wrote:
> > > Currently, mm_struct has two refcounts:
> > >
> > >  - mm_users: preserves everything - the mm_struct, the page tables, the
> > >    memory mappings, and so on
> > >  - mm_count: preserves the mm_struct and pgd
> > >
> > > However, there are three types of users of mm_struct:
> > >
> > > 1. users that want page tables, memory mappings and so on
> > > 2. users that want to preserve the pgd (for lazy TLB)
> > > 3. users that just want to keep the mm_struct itself around (e.g. for
> > >    mmget_not_zero() or __ptrace_may_access())
> > >
> > > Dropping mm_count references can be messy because dropping mm_count to
> > > zero deletes the pgd, which takes the pgd_lock on x86, meaning it doesn't
> > > work from RCU callbacks (which run in IRQ context). In those cases,
> > > mmdrop_async() must be used to punt the invocation of __mmdrop() to
> > > workqueue context.
> > >
> > > That's fine when mmdrop_async() is a rare case, but the preceding patch
> > > "ptrace: Keep mm around after exit_mm() for __ptrace_may_access()" makes it
> > > the common case; we should probably avoid punting freeing to workqueue
> > > context all the time if we can avoid it?
> > >
> > > To resolve this, add a third refcount that just protects the mm_struct and
> > > the user_ns it points to, and which can be dropped with synchronous freeing
> > > from (almost) any context.
> > >
> > > Signed-off-by: Jann Horn <jannh@...gle.com>
> > > ---
> > >  arch/x86/kernel/tboot.c    |  2 ++
> > >  drivers/firmware/efi/efi.c |  2 ++
> > >  include/linux/mm_types.h   | 13 +++++++++++--
> > >  include/linux/sched/mm.h   | 13 +++++++++++++
> > >  kernel/fork.c              | 14 ++++++++++----
> > >  mm/init-mm.c               |  2 ++
> > >  6 files changed, 40 insertions(+), 6 deletions(-)
> >
> > I think mmu notifiers and the stuff in drivers/infiniband/core/ can be
> > converted to this as well..
> >
> > Actually I kind of wonder if you should go the reverse and find the
> > few callers that care about the pgd and give them a new api with a
> > better name (mmget_pgd?).
>
> Yeah, that might make more sense... as long as I'm really sure about
> all the places I haven't changed. ^^
>
> I'll try to change it as you suggested for v2.

Actually, no - I think it ends up being around 30 mentions of the
"take reference without PGD" function and around 35 mentions of the
"take reference with PGD" function (assuming that all the weird
powerpc stuff I don't understand needs the mm_context to not yet be
destroyed). (A decent chunk of which are all the per-arch functions
for starting secondary processors.) So I don't think doing it the way
you suggested would really make the patch(es) smaller.

And I think that it is helpful for review purposes to have separate
patches for every converted site, and leave things as-is by default.
If the semantics change for every user that is *not* touched by the
patch, that makes it really easy for mistakes to slip through.

I could try to convert more callers though?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ