[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2bea526c-7360-550d-0da6-5598259d04b6@linux.intel.com>
Date: Fri, 6 Oct 2017 15:44:45 +0100
From: Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To: Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
Daniel Vetter <daniel.vetter@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Sasha Levin <alexander.levin@...izon.com>
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: Preallocate our mmu notifier
workequeu to unbreak cpu hotplug deadlock
On 06/10/2017 15:23, Daniel Vetter wrote:
> On Fri, Oct 06, 2017 at 12:34:02PM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/10/2017 10:06, Daniel Vetter wrote:
>>> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
>>> seems to have uncovered a few more rules about what is allowed and
>>> isn't.
>>>
>>> This one here seems to indicate that allocating a work-queue while
>>> holding mmap_sem is a no-go, so let's try to preallocate it.
>>>
>>> Of course another way to break this chain would be somewhere in the
>>> cpu hotplug code, since this isn't the only trace we're finding now
>>> which goes through msr_create_device.
>>>
>>> Full lockdep splat:
[snipped lockdep splat]
>>> v2: Set ret correctly when we raced with another thread.
>>>
>>> v3: Use Chris' diff. Attach the right lockdep splat.
>>>
>>> Cc: Chris Wilson <chris@...is-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@...el.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>
>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>> Cc: Sasha Levin <alexander.levin@...izon.com>
>>> Cc: Marta Lofstedt <marta.lofstedt@...el.com>
>>> Cc: Tejun Heo <tj@...nel.org>
>>> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
>>> Signed-off-by: Daniel Vetter <daniel.vetter@...el.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
>>> 1 file changed, 20 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> index 2d4996de7331..f9b3406401af 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
>>> i915_mmu_notifier_create(struct mm_struct *mm)
>>> {
>>> struct i915_mmu_notifier *mn;
>>> - int ret;
>>> mn = kmalloc(sizeof(*mn), GFP_KERNEL);
>>> if (mn == NULL)
>>> @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>>> return ERR_PTR(-ENOMEM);
>>> }
>>> - /* Protected by mmap_sem (write-lock) */
>>> - ret = __mmu_notifier_register(&mn->mn, mm);
>>> - if (ret) {
>>> - destroy_workqueue(mn->wq);
>>> - kfree(mn);
>>> - return ERR_PTR(ret);
>>> - }
>>> -
>>> return mn;
>>> }
>>> @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
>>> static struct i915_mmu_notifier *
>>> i915_mmu_notifier_find(struct i915_mm_struct *mm)
>>> {
>>> - struct i915_mmu_notifier *mn = mm->mn;
>>> + struct i915_mmu_notifier *mn;
>>> + int err;
>>> mn = mm->mn;
>>> if (mn)
>>> return mn;
>>> + mn = i915_mmu_notifier_create(mm->mm);
>>> + if (IS_ERR(mn))
>>> + return mn;
>>
>> Strictly speaking we don't want to fail just yet, only it we actually needed
>> a new notifier and we failed to create it.
>
> The check 2 lines above not good enough? It's somewhat racy, but I'm not
> sure what value we provide by being perfectly correct against low memory.
> This thread racing against a 2nd one, where the minimal allocation of the
> 2nd one pushed us perfectly over the oom threshold seems a very unlikely
> scenario.
>
> Also, small allocations actually never fail :-)
Yes, but, we otherwise make each other re-spin for much smaller things
than bailout logic being conceptually at the wrong place. So for me I'd
like a respin. It's not complicated at all, just move the bailout to to
before the __mmu_notifier_register:
...
err = 0;
if (IS_ERR(mn))
err = PTR_ERR(..);
...
if (mana->manah == NULL) { /* ;-D */
/* Protect by mmap_sem...
if (err == 0) {
err = __mmu_notifier_register(..);
...
}
}
...
if (mn && !IS_ERR(mn)) {
...free...
}
I think.. ?
R-b on this, plus below, unless I got something wrong.
>
>>
>>> +
>>> + err = 0;
>>> down_write(&mm->mm->mmap_sem);
>>> mutex_lock(&mm->i915->mm_lock);
>>> - if ((mn = mm->mn) == NULL) {ed
>>> - mn = i915_mmu_notifier_create(mm->mm);
>>> - if (!IS_ERR(mn))
>>> - mm->mn = mn;
>>> + if (mm->mn == NULL) {
>>> + /* Protected by mmap_sem (write-lock) */
>>> + err = __mmu_notifier_register(&mn->mn, mm->mm);
>>> + if (!err) {
>>> + /* Protected by mm_lock */
>>> + mm->mn = fetch_and_zero(&mn);
>>> + }
>>> }
>>> mutex_unlock(&mm->i915->mm_lock);
>>> up_write(&mm->mm->mmap_sem);
>>> - return mn;
>>> + if (mn) {
>>> + destroy_workqueue(mn->wq);
>>> + kfree(mn);
>>> + }
>>> +
>>> + return err ? ERR_PTR(err) : mm->mn;
>>> }
>>> static int
>>>
>>
>> Otherwise looks good to me.
>>
>> I would also put a note in the commit on how working around the locking
>> issue is also beneficial to performance with moving the allocation step
>> outside the mmap_sem.
>
> Yeah Chris brought that up too, I don't really buy it given how
> heavy-weight __mmu_notifier_register is. But I can add something like:
>
> "This also has the minor benefit of slightly reducing the critical
> section where we hold mmap_sem."
>
> r-b with that added to the commit message?
I think for me it is more making it clear that we are working around
something where we are not strictly broken by design with a, however
meaningless, optimisation at the same time.
Regards,
Tvrtko
Powered by blists - more mailing lists