[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a27ad1a3-34bd-6b7d-fd09-7737ec3c888d@gmail.com>
Date: Fri, 24 Aug 2018 13:57:52 +0200
From: Christian König <ckoenig.leichtzumerken@...il.com>
To: Michal Hocko <mhocko@...nel.org>, christian.koenig@....com
Cc: kvm@...r.kernel.org,
Radim Krčmář <rkrcmar@...hat.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Sudeep Dutt <sudeep.dutt@...el.com>,
dri-devel@...ts.freedesktop.org, linux-mm@...ck.org,
Andrea Arcangeli <aarcange@...hat.com>,
"David (ChunMing) Zhou" <David1.Zhou@....com>,
Dimitri Sivanich <sivanich@....com>,
Jason Gunthorpe <jgg@...pe.ca>, linux-rdma@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, David Airlie <airlied@...ux.ie>,
Doug Ledford <dledford@...hat.com>,
David Rientjes <rientjes@...gle.com>,
xen-devel@...ts.xenproject.org, intel-gfx@...ts.freedesktop.org,
Jani Nikula <jani.nikula@...ux.intel.com>,
Leon Romanovsky <leonro@...lanox.com>,
Jérôme Glisse <jglisse@...hat.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Juergen Gross <jgross@...e.com>,
Mike Marciniszyn <mike.marciniszyn@...el.com>,
Dennis Dalessandro <dennis.dalessandro@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Ashutosh Dixit <ashutosh.dixit@...el.com>,
Alex Deucher <alexander.deucher@....com>,
Paolo Bonzini <pbonzini@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Felix Kuehling <felix.kuehling@....com>
Subject: Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
Am 24.08.2018 um 13:52 schrieb Michal Hocko:
> On Fri 24-08-18 13:43:16, Christian König wrote:
>> Am 24.08.2018 um 13:32 schrieb Michal Hocko:
>>> On Fri 24-08-18 19:54:19, Tetsuo Handa wrote:
>>>> Two more worries for this patch.
>>>>
>>>>
>>>>
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
>>>>> @@ -178,12 +178,18 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn)
>>>>> *
>>>>> * @amn: our notifier
>>>>> */
>>>>> -static void amdgpu_mn_read_lock(struct amdgpu_mn *amn)
>>>>> +static int amdgpu_mn_read_lock(struct amdgpu_mn *amn, bool blockable)
>>>>> {
>>>>> - mutex_lock(&amn->read_lock);
>>>>> + if (blockable)
>>>>> + mutex_lock(&amn->read_lock);
>>>>> + else if (!mutex_trylock(&amn->read_lock))
>>>>> + return -EAGAIN;
>>>>> +
>>>>> if (atomic_inc_return(&amn->recursion) == 1)
>>>>> down_read_non_owner(&amn->lock);
>>>> Why don't we need to use trylock here if blockable == false ?
>>>> Want comment why it is safe to use blocking lock here.
>>> Hmm, I am pretty sure I have checked the code but it was quite confusing
>>> so I might have missed something. Double checking now, it seems that
>>> this read_lock is not used anywhere else and it is not _the_ lock we are
>>> interested about. It is the amn->lock (amdgpu_mn_lock) which matters as
>>> it is taken in exclusive mode for expensive operations.
>> The write side of the lock is only taken in the command submission IOCTL.
>>
>> So you actually don't need to change anything here (even the proposed
>> changes are overkill) since we can't tear down the struct_mm while an IOCTL
>> is still using.
> I am not so sure. We are not in the mm destruction phase yet. This is
> mostly about the oom context which might fire right during the IOCTL. If
> any of the path which is holding the write lock blocks for unbound
> amount of time or even worse allocates a memory then we are screwed. So
> we need to back of when blockable = false.
Oh, yeah good point. Haven't thought about that possibility.
>
>>> Is that correct Christian? If this is correct then we need to update the
>>> locking here. I am struggling to grasp the ref counting part. Why cannot
>>> all readers simply take the lock rather than rely on somebody else to
>>> take it? 1ed3d2567c800 didn't really help me to understand the locking
>>> scheme here so any help would be appreciated.
>> That won't work like this there might be multiple
>> invalidate_range_start()/invalidate_range_end() pairs open at the same time.
>> E.g. the lock might be taken recursively and that is illegal for a
>> rw_semaphore.
> I am not sure I follow. Are you saying that one invalidate_range might
> trigger another one from the same path?
No, but what can happen is:
invalidate_range_start(A,B);
invalidate_range_start(C,D);
...
invalidate_range_end(C,D);
invalidate_range_end(A,B);
Grabbing the read lock twice would be illegal in this case.
Regards,
Christian.
Powered by blists - more mailing lists