[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1501554199.5269.22.camel@gmx.de>
Date: Tue, 01 Aug 2017 04:23:19 +0200
From: Mike Galbraith <efault@....de>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for
systems and workloads
On Mon, 2017-07-31 at 16:38 -0400, Johannes Weiner wrote:
> On Mon, Jul 31, 2017 at 09:49:39PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-07-31 at 14:41 -0400, Johannes Weiner wrote:
> > >
> > > Adding an rq counter for tasks inside memdelay sections should be
> > > straight-forward as well (except for maybe the migration cost of that
> > > state between CPUs in ttwu that Mike pointed out).
> >
> > What I pointed out should be easily eliminated (zero use case).
>
> How so?
I was thinking along the lines of schedstat_enabled().
> > > That leaves the question of how to track these numbers per cgroup at
> > > an acceptable cost. The idea for a tree of cgroups is that walltime
> > > impact of delays at each level is reported for all tasks at or below
> > > that level. E.g. a leave group aggregates the state of its own tasks,
> > > the root/system aggregates the state of all tasks in the system; hence
> > > the propagation of the task state counters up the hierarchy.
> >
> > The crux of the biscuit is where exactly the investment return lies.
> > Gathering of these numbers ain't gonna be free, no matter how hard you
> > try, and you're plugging into paths where every cycle added is made of
> > userspace hide.
>
> Right. But how to implement it sanely and optimize for cycles, and
> whether we want to default-enable this interface are two separate
> conversations.
>
> It makes sense to me to first make the implementation as lightweight
> on cycles and maintainability as possible, and then worry about the
> cost / benefit defaults of the shipped Linux kernel afterwards.
>
> That goes for the purely informative userspace interface, anyway. The
> easily-provoked thrashing livelock I have described in the email to
> Andrew is a different matter. If the OOM killer requires hooking up to
> this metric to fix it, it won't be optional. But the OOM code isn't
> part of this series yet, so again a conversation best had later, IMO.
If that "the many must pay a toll to save the few" conversation ever
happens, just recall me registering my boo/hiss in advance. I don't
have to feel guilty about not liking the idea of making donations to
feed the poor starving proggies ;-)
-Mike
Powered by blists - more mailing lists