[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1702081253590.3536@nanos>
Date: Wed, 8 Feb 2017 13:02:07 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Michal Hocko <mhocko@...nel.org>
cc: Christoph Lameter <cl@...ux.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Vlastimil Babka <vbabka@...e.cz>,
Dmitry Vyukov <dvyukov@...gle.com>, Tejun Heo <tj@...nel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
syzkaller <syzkaller@...glegroups.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Michal Hocko wrote:
> On Tue 07-02-17 23:25:17, Thomas Gleixner wrote:
> > On Tue, 7 Feb 2017, Christoph Lameter wrote:
> > > On Tue, 7 Feb 2017, Michal Hocko wrote:
> > >
> > > > I am always nervous when seeing hotplug locks being used in low level
> > > > code. It has bitten us several times already and those deadlocks are
> > > > quite hard to spot when reviewing the code and very rare to hit so they
> > > > tend to live for a long time.
> > >
> > > Yep. Hotplug events are pretty significant. Using stop_machine_XXXX() etc
> > > would be advisable and that would avoid the taking of locks and get rid of all the
> > > ocmplexity, reduce the code size and make the overall system much more
> > > reliable.
> >
> > Huch? stop_machine() is horrible and heavy weight. Don't go there, there
> > must be simpler solutions than that.
>
> Absolutely agreed. We are in the page allocator path so using the
> stop_machine* is just ridiculous. And, in fact, there is a much simpler
> solution [1]
>
> [1] http://lkml.kernel.org/r/20170207201950.20482-1-mhocko@kernel.org
Well, yes. It's simple, but from an RT point of view I really don't like
it as we have to fix it up again.
On RT we solved the problem of the page allocator differently which allows
us to do drain_all_pages() from the caller CPU as a side effect. That's
interesting not only for RT, it's also interesting for NOHZ FULL scenarios
because you don't inflict the work on the other CPUs.
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-4.9.y-rt-rebase&id=d577a017da694e29a06af057c517f2a7051eb305
That uses local locks (an RT speciality which compile away into preempt/irq
disable/enable when RT is disabled).
Works like a charm :)
Thanks,
tglx
Powered by blists - more mailing lists