[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=N3m3cTkyh3ebA+79_oOEe4QMDbU2BxjEJCgfRanX9g9Q@mail.gmail.com>
Date: Thu, 26 Sep 2024 18:52:39 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Ivan Shapovalov <intelfx@...elfx.name>
Cc: linux-kernel@...r.kernel.org, Mike Yuan <me@...dnzj.com>, Tejun Heo <tj@...nel.org>,
Zefan Li <lizefan.x@...edance.com>, Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
Jonathan Corbet <corbet@....net>, Yosry Ahmed <yosryahmed@...gle.com>,
Chengming Zhou <chengming.zhou@...ux.dev>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>,
Chris Li <chrisl@...nel.org>, cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH] zswap: improve memory.zswap.writeback inheritance
On Thu, Sep 26, 2024 at 4:40 PM Ivan Shapovalov <intelfx@...elfx.name> wrote:
>
> On 2024-09-26 at 16:12 -0700, Nhat Pham wrote:
> > On Thu, Sep 26, 2024 at 3:55 PM Ivan Shapovalov <intelfx@...elfx.name> wrote:
> > >
> > > Improve the inheritance behavior of the `memory.zswap.writeback` cgroup
> > > attribute introduced during the 6.11 cycle. Specifically, in 6.11 we
> > > walk the parent cgroups until we find a _disabled_ writeback, which does
> > > not allow the user to selectively enable zswap writeback while having it
> > > disabled by default.
> >
> > Is there an actual need for this? This is a theoretical use case I
> > thought of (and raised), but I don't think anybody actually wants
> > this...?
>
> This is of course anecdata, but yes, it does solve a real use-case that
> I'm having right now, as well as a bunch of my colleagues who recently
> complained to me (in private) about pretty much the same use-case.
>
> The use-case is following: it turns out that it could be beneficial for
> desktop systems to run with a pretty high swappiness and zswap
> writeback globally disabled, to nudge the system to compress cold pages
> but not actually write them back to the disk (which would happen pretty
> aggressively if it was not disabled) to reduce I/O and latencies.
> However, under this setup it is sometimes needed to re-enable zswap
> writeback for specific memory-heavy applications that allocate a lot of
> cold pages, to "allow" the kernel to actually swap those programs out.
Out of pure curiosity (and to make sure I fully grasp the problem at
hand), is this the capacity-based zswap writeback (i.e the one
triggered by limits), or the memory pressure based dynamic shrinker?
If you disable the latter and only rely on the former, it should not
"write pages aggressively". Limits are rarely reached (IIUC, zswap.max
are frequently used as binary knobs, and global limits are hard to
reach), so usually pages that are going to disk swap are just pages
zswap reject (i.e mostly just pages that fail to compress). This
should be a very small category. You will still see "swap" usage due
to a quirk in zswap architecture (which I'm working to fix), but there
should rarely be any IOs. So the setup itself is not super necessary.
If it's the latter then yeah I can kinda see the need for the setup.
>
> >
> > Besides, most people who want this can just:
> >
> > 1. Enable zswap writeback on root cgroup (and all non-leaf cgroups).
> >
> > 2. Disable zswap writeback on leaf cgroups on creation by default.
> >
> > 3. Selectively enable zswap writeback for the leaf cgroups.
> >
> > All of this is quite doable in userspace. It's not even _that_ racy -
> > just do this before adding tasks etc. to the cgroup?
>
> Well, yes, it is technically doable from userspace, just like it was
> technically doable prior to commit e39925734909 to have userspace
> explicitly control the entire hierarchy of writeback settings.
> However it _is_ pretty painful, and the flow you described would
> essentially negate any benefits of that patch (it would require
> userspace to, once again, manage the entire hierarchy explicitly
> without any help from the kernel).
I think it's a tad different. In the case of the commit e39925734909,
the hierarchical behavior of zswap.writeback knob is quite
semantically confusing, almost counter-intuitive (and does not conform
to the convention of other cgroup knobs, which use the most
restrictive limit - check out zswap.max limit checking for example).
That commit arguably fixes it for the "common" case (i.e you want the
hierarchical enforcement to hold for the most part). That's why there
were even conversations about cc-ing the stable mailing list for
backporting it to older kernels.
This is more of a "new use case" patch. It complicates the API, for
something readily doable in userspace - the kernel does not do
anything that userspace cannot achieve. So it should undergo stricter
scrutiny. :)
Anyway, Yosry, Johannes, how do you feel about this?
Powered by blists - more mailing lists