[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171005162840.nr5wkvt6rcmbixmu@phenom.ffwll.local>
Date: Thu, 5 Oct 2017 18:28:40 +0200
From: Daniel Vetter <daniel@...ll.ch>
To: Thomas Gleixner <tglx@...utronix.de>, Tejun Heo <tj@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>
Cc: Daniel Vetter <daniel@...ll.ch>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
LKML <linux-kernel@...r.kernel.org>,
Chris Wilson <chris@...is-wilson.co.uk>,
Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Sasha Levin <alexander.levin@...izon.com>,
Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu
hotplug deadlock
On Thu, Oct 05, 2017 at 06:19:30PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Daniel Vetter wrote:
> > On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote:
> > > Aside of that, is it really required to use stomp_machine() for this
> > > synchronization? We certainly have less intrusive mechansisms than that.
> >
> > Yeah, the stop_machine needs to go, I'm working on something that uses
> > rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have
> > merged even.
> >
> > Now this one isn't the one I wanted to fix with this patch since there's
> > clearly something dubious going on on the i915 side too.
>
> I already wondered :)
>
> > The proper trace, with the same part on the cpu hotplug side, highlights
> > that you can't allocate a workqueue while hodling mmap_sem. That one
> > matches patch description&diff a bit better :-)
>
> > Sorry for misleading you, should have checked to attach the right one. No
> > stop_machine()/i915_gem_set_wedged() in the below one.
>
> Well the problem is more or less the same and what I said about solving it
> in a different place is still valid. I think about it some more, but don't
> expect wonders :)
Yeah just want to make you aware there's now new implications in the
locking maze and that we overall decide to break the loop in the right
place. Also adding Tejun, since this is about workqueues, I forgot him.
tldr for Tejun: The new cross-release stuff in lockdep seems to indicate
that we cannot allocate a new workqueue while holding mmap_sem. Full
details in the thread.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists