[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1211161157390.2788@chino.kir.corp.google.com>
Date: Fri, 16 Nov 2012 12:04:32 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Glauber Costa <glommer@...allels.com>
cc: Anton Vorontsov <anton.vorontsov@...aro.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Pekka Enberg <penberg@...nel.org>,
Mel Gorman <mgorman@...e.de>,
Leonid Moiseichuk <leonid.moiseichuk@...ia.com>,
KOSAKI Motohiro <kosaki.motohiro@...il.com>,
Minchan Kim <minchan@...nel.org>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
John Stultz <john.stultz@...aro.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linaro-kernel@...ts.linaro.org,
patches@...aro.org, kernel-team@...roid.com,
linux-man@...r.kernel.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Michal Hocko <mhocko@...e.cz>,
Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>
Subject: Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications
On Fri, 16 Nov 2012, Glauber Costa wrote:
> My personal take:
>
> Most people hate memcg due to the cost it imposes. I've already
> demonstrated that with some effort, it doesn't necessarily have to be
> so. (http://lwn.net/Articles/517634/)
>
> The one thing I missed on that work, was precisely notifications. If you
> can come up with a good notifications scheme that *lives* in memcg, but
> does not *depend* in the memcg infrastructure, I personally think it
> could be a big win.
>
This doesn't allow users of cpusets without memcg to have an API for
memory pressure, that's why I thought it should be a new cgroup that can
be mounted alongside any existing cgroup, any cgroup in the future, or
just by itself.
> Doing this in memcg has the advantage that the "per-group" vs "global"
> is automatically solved, since the root memcg is just another name for
> "global".
>
That's true of any cgroup.
> I honestly like your low/high/oom scheme better than memcg's
> "threshold-in-bytes". I would also point out that those thresholds are
> *far* from exact, due to the stock charging mechanism, and can be wrong
> by as much as O(#cpus). So far, nobody complained. So in theory it
> should be possible to convert memcg to low/high/oom, while still
> accepting writes in bytes, that would be thrown in the closest bucket.
>
I'm wondering if we should have more than three different levels.
> Another thing from one of your e-mails, that may shift you in the memcg
> direction:
>
> "2. The last time I checked, cgroups memory controller did not (and I
> guess still does not) not account kernel-owned slabs. I asked several
> times why so, but nobody answered."
>
> It should, now, in the latest -mm, although it won't do per-group
> reclaim (yet).
>
Not sure where that was written, but I certainly didn't write it and it's
not really relevant in this discussion: memory pressure notifications
would be triggered by reclaim when trying to allocate memory; why we need
to reclaim or how we got into that state is tangential. It certainly may
be because a lot of slab was allocated, but that's not the only case.
> I am also failing to see how cpusets would be involved in here. I
> understand that you may have free memory in terms of size, but still be
> further restricted by cpuset. But I also think that having multiple
> entry points for this buy us nothing at all. So the choices I see are:
>
Umm, why do users of cpusets not want to be able to trigger memory
pressure notifications?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists