[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180625133157.GL28965@dhcp22.suse.cz>
Date: Mon, 25 Jun 2018 15:31:57 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Felix Kuehling <felix.kuehling@....com>
Cc: Christian König <christian.koenig@....com>,
LKML <linux-kernel@...r.kernel.org>,
"David (ChunMing) Zhou" <David1.Zhou@....com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>,
Alex Deucher <alexander.deucher@....com>,
David Airlie <airlied@...ux.ie>,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Doug Ledford <dledford@...hat.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Mike Marciniszyn <mike.marciniszyn@...el.com>,
Dennis Dalessandro <dennis.dalessandro@...el.com>,
Sudeep Dutt <sudeep.dutt@...el.com>,
Ashutosh Dixit <ashutosh.dixit@...el.com>,
Dimitri Sivanich <sivanich@....com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Juergen Gross <jgross@...e.com>,
Jérôme Glisse <jglisse@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>, kvm@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org, linux-rdma@...r.kernel.org,
xen-devel@...ts.xenproject.org, linux-mm@...ck.org,
David Rientjes <rientjes@...gle.com>
Subject: Re: [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon 25-06-18 10:01:03, Michal Hocko wrote:
> On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> > On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > > On Fri 22-06-18 17:13:02, Christian König wrote:
> > >> Hi Michal,
> > >>
> > >> [Adding Felix as well]
> > >>
> > >> Well first of all you have a misconception why at least the AMD graphics
> > >> driver need to be able to sleep in an MMU notifier: We need to sleep because
> > >> we need to wait for hardware operations to finish and *NOT* because we need
> > >> to wait for locks.
> > >>
> > >> I'm not sure if your flag now means that you generally can't sleep in MMU
> > >> notifiers any more, but if that's the case at least AMD hardware will break
> > >> badly. In our case the approach of waiting for a short time for the process
> > >> to be reaped and then select another victim actually sounds like the right
> > >> thing to do.
> > > Well, I do not need to make the notifier code non blocking all the time.
> > > All I need is to ensure that it won't sleep if the flag says so and
> > > return -EAGAIN instead.
> > >
> > > So here is what I do for amdgpu:
> >
> > In the case of KFD we also need to take the DQM lock:
> >
> > amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> > kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> >
> > So we'd need to pass the blockable parameter all the way through that
> > call chain.
>
> Thanks, I have missed that part. So I guess I will start with something
> similar to intel-gfx and back off when the current range needs some
> treatment. So this on top. Does it look correct?
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index d138a526feff..e2d422b3eb0b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
> struct amdgpu_mn_node *node;
> struct amdgpu_bo *bo;
>
> + if (!blockable) {
> + amdgpu_mn_read_unlock();
> + return -EAGAIN;
> + }
> +
> node = container_of(it, struct amdgpu_mn_node, it);
> it = interval_tree_iter_next(it, start, end);
Ble, just noticed that half of the change didn't get to git index...
This is what I have
commit c4701b36ac2802b903db3d05cf77c030fccce3a8
Author: Michal Hocko <mhocko@...e.com>
Date: Mon Jun 25 15:24:03 2018 +0200
fold me
- amd gpu notifiers can sleep deeper in the callchain (evict_process_queues_cpsch
on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
we bail out when we have an intersecting range for starter
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..3399a4a927fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -225,6 +225,11 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
while (it) {
struct amdgpu_mn_node *node;
+ if (!blockable) {
+ amdgpu_mn_read_unlock(rmn);
+ return -EAGAIN;
+ }
+
node = container_of(it, struct amdgpu_mn_node, it);
it = interval_tree_iter_next(it, start, end);
@@ -266,6 +271,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
struct amdgpu_mn_node *node;
struct amdgpu_bo *bo;
+ if (!blockable) {
+ amdgpu_mn_read_unlock(rmn);
+ return -EAGAIN;
+ }
+
node = container_of(it, struct amdgpu_mn_node, it);
it = interval_tree_iter_next(it, start, end);
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists