linux-kernel - Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1501554199.5269.22.camel@gmx.de>
Date:   Tue, 01 Aug 2017 04:23:19 +0200
From:   Mike Galbraith <efault@....de>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for
 systems and workloads

On Mon, 2017-07-31 at 16:38 -0400, Johannes Weiner wrote:
> On Mon, Jul 31, 2017 at 09:49:39PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-07-31 at 14:41 -0400, Johannes Weiner wrote:
> > > 
> > > Adding an rq counter for tasks inside memdelay sections should be
> > > straight-forward as well (except for maybe the migration cost of that
> > > state between CPUs in ttwu that Mike pointed out).
> > 
> > What I pointed out should be easily eliminated (zero use case).
> 
> How so?

I was thinking along the lines of schedstat_enabled().

> > > That leaves the question of how to track these numbers per cgroup at
> > > an acceptable cost. The idea for a tree of cgroups is that walltime
> > > impact of delays at each level is reported for all tasks at or below
> > > that level. E.g. a leave group aggregates the state of its own tasks,
> > > the root/system aggregates the state of all tasks in the system; hence
> > > the propagation of the task state counters up the hierarchy.
> > 
> > The crux of the biscuit is where exactly the investment return lies.
> >  Gathering of these numbers ain't gonna be free, no matter how hard you
> > try, and you're plugging into paths where every cycle added is made of
> > userspace hide.
> 
> Right. But how to implement it sanely and optimize for cycles, and
> whether we want to default-enable this interface are two separate
> conversations.
> 
> It makes sense to me to first make the implementation as lightweight
> on cycles and maintainability as possible, and then worry about the
> cost / benefit defaults of the shipped Linux kernel afterwards.
> 
> That goes for the purely informative userspace interface, anyway. The
> easily-provoked thrashing livelock I have described in the email to
> Andrew is a different matter. If the OOM killer requires hooking up to
> this metric to fix it, it won't be optional. But the OOM code isn't
> part of this series yet, so again a conversation best had later, IMO.

If that "the many must pay a toll to save the few" conversation ever
happens, just recall me registering my boo/hiss in advance.  I don't
have to feel guilty about not liking the idea of making donations to
feed the poor starving proggies ;-)

	-Mike