linux-kernel - Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180622161845.GA3497@redhat.com>
Date:   Fri, 22 Jun 2018 12:18:46 -0400
From:   Jerome Glisse <jglisse@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Chris Wilson <chris@...is-wilson.co.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko
         <mhocko@...e.com>, kvm@...r.kernel.org,  " Radim Krčmář <rkrcmar@...hat.com>, David Airlie <airlied@...ux.ie>,
        Sudeep Dutt <sudeep.dutt@...el.com>,
        dri-devel@...ts.freedesktop.org, linux-mm@...ck.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        "David (ChunMing) Zhou" <David1.Zhou@....com>,
        Dimitri Sivanich <sivanich@....com>,
        linux-rdma@...r.kernel.org, amd-gfx@...ts.freedesktop.org,
        Jason Gunthorpe <jgg@...pe.ca>,
        Doug Ledford <dledford@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        xen-devel@...ts.xenproject.org, intel-gfx@...ts.freedesktop.org,
        Rodrigo@...ck.org, Vivi@...ck.org, Boris@...ck.org,
        Ostrovsky@...ck.org, Juergen@...ck.org, Gross@...ck.org,
        Mike@...ck.org, Marciniszyn@...ck.org, Dennis@...ck.org,
        Dalessandro@...ck.org, Ashutosh@...ck.org, Dixit@...ck.org,
        Alex@...ck.org, Deucher@...ck.org, Paolo@...ck.org,
        Bonzini@...ck.org
Subject: Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for
 mmu notifiers

On Fri, Jun 22, 2018 at 05:57:16PM +0200, Michal Hocko wrote:
> On Fri 22-06-18 16:36:49, Chris Wilson wrote:
> > Quoting Michal Hocko (2018-06-22 16:02:42)
> > > Hi,
> > > this is an RFC and not tested at all. I am not very familiar with the
> > > mmu notifiers semantics very much so this is a crude attempt to achieve
> > > what I need basically. It might be completely wrong but I would like
> > > to discuss what would be a better way if that is the case.
> > > 
> > > get_maintainers gave me quite large list of people to CC so I had to trim
> > > it down. If you think I have forgot somebody, please let me know
> > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > index 854bd51b9478..5285df9331fa 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > @@ -112,10 +112,11 @@ static void del_object(struct i915_mmu_object *mo)
> > >         mo->attached = false;
> > >  }
> > >  
> > > -static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > +static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > >                                                        struct mm_struct *mm,
> > >                                                        unsigned long start,
> > > -                                                      unsigned long end)
> > > +                                                      unsigned long end,
> > > +                                                      bool blockable)
> > >  {
> > >         struct i915_mmu_notifier *mn =
> > >                 container_of(_mn, struct i915_mmu_notifier, mn);
> > > @@ -124,7 +125,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > >         LIST_HEAD(cancelled);
> > >  
> > >         if (RB_EMPTY_ROOT(&mn->objects.rb_root))
> > > -               return;
> > > +               return 0;
> > 
> > The principle wait here is for the HW (even after fixing all the locks
> > to be not so coarse, we still have to wait for the HW to finish its
> > access).
> 
> Is this wait bound or it can take basically arbitrary amount of time?

Arbitrary amount of time but in desktop use case you can assume that
it should never go above 16ms for a 60frame per second rendering of
your desktop (in GPU compute case this kind of assumption does not
hold). Is the process exit_state already updated by the time this mmu
notifier callbacks happen ?

> 
> > The first pass would be then to not do anything here if
> > !blockable.
> 
> something like this? (incremental diff)

What i wanted to do with HMM and mmu notifier is split the invalidation
in 2 pass. First pass tell the drivers to stop/cancel pending jobs that
depends on the range and invalidate internal driver states (like clear
buffer object pages array in case of GPU but not GPU page table). While
the second callback would do the actual wait on the GPU to be done and
update the GPU page table.

Now in this scheme in case the task is already in some exit state and
that all CPU threads are frozen/kill then we can probably find a way to
do the first path mostly lock less. AFAICR nor AMD nor Intel allow to
share userptr bo hence a uptr bo should only ever be access through
ioctl submited by the process.

The second call can then be delayed and ping from time to time to see
if GPU jobs are done.


Note that what you propose might still be useful as in case there is
no buffer object for a range then OOM can make progress in freeing a
range of memory. It is very likely that significant virtual address
range of a process and backing memory can be reclaim that way. This
assume OOM reclaim vma by vma or in some form of granularity like
reclaiming 1GB by 1GB. Or we could also update blocking callback to
return range that are blocking that way OOM can reclaim around.

Cheers,
Jérôme