lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZ_jNuYQsGMyS1NgMf335Gi4_x5Ybkts_=+g5OyjtJQDQ@mail.gmail.com>
Date: Wed, 14 Aug 2024 13:22:01 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: Mike Yuan <me@...dnzj.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	cgroups@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Muchun Song <muchun.song@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Michal Hocko <mhocko@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH] mm/memcontrol: respect zswap.writeback setting from
 parent cg too

On Wed, Aug 14, 2024 at 12:52 PM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Wed, Aug 14, 2024 at 10:20 AM Mike Yuan <me@...dnzj.com> wrote:
> >
> > Currently, the behavior of zswap.writeback wrt.
> > the cgroup hierarchy seems a bit odd. Unlike zswap.max,
> > it doesn't honor the value from parent cgroups. This
> > surfaced when people tried to globally disable zswap writeback,
> > i.e. reserve physical swap space only for hibernation [1] -
> > disabling zswap.writeback only for the root cgroup results
> > in subcgroups with zswap.writeback=1 still performing writeback.
> >
> > The consistency became more noticeable after I introduced
> > the MemoryZSwapWriteback= systemd unit setting [2] for
> > controlling the knob. The patch assumed that the kernel would
> > enforce the value of parent cgroups. It could probably be
> > workarounded from systemd's side, by going up the slice unit
> > tree and inherit the value. Yet I think it's more sensible
> > to make it behave consistently with zswap.max and friends.
>
> May I ask you to add/clarify this new expected behavior in
> Documentation/admin-guide/cgroup-v2.rst?
>
> >
> > [1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
>
> This is an interesting use case. Never envisioned this when I
> developed this feature :)
>
> > [2] https://github.com/systemd/systemd/pull/31734
> >
> > Signed-off-by: Mike Yuan <me@...dnzj.com>
> > ---
>
> Personally, I don't feel too strongly about this one way or another. I
> guess you can make a case that people want to disable zswap writeback
> by default, and only selectively enable it for certain descendant
> workloads - for convenience, they would set memory.zswap.writeback ==
> 0 at root, then enable it on selected descendants?
>
> It's not super expensive IMHO - we already perform upward traversal on
> every zswap store. This wouldn't be the end of the world.
>
> Yosry, Johannes - how do you two feel about this?

I wasn't CC'd on this, but found it by chance :) I think there is a
way for the zswap maintainers entry to match any patch that mentions
"zswap", not just based on files, right?

Anyway, both use cases make sense to me, disabling writeback
system-wide or in an entire subtree, and disabling writeback on the
root and then selectively enabling it. I am slightly inclined to the
first one (what this patch does).

Considering the hierarchical cgroup knobs work, we usually use the
most restrictive limit among the ancestors. I guess it ultimately
depends on how we define "most restrictive". Disabling writeback is
restrictive in the sense that you don't have access to free some zswap
space to reclaim more memory. OTOH, disabling writeback also means
that your zswapped memory won't go to disk under memory pressure, so
in that sense it would be restrictive to force writeback :)

Usually, the "default" is the non-restrictive thing, and then you can
set restrictions that apply to all children (e.g. no limits are set by
default). Since writeback is enabled by default, it seems like the
restriction would be disabling writeback. Hence, it would make sense
to inherit zswap disabling (i.e. only writeback if all ancestors allow
it, like this patch does).

What we do today dismisses inheritance completely, so it seems to me
like it should be changed anyway.

I am thinking out loud here, let me know if my reasoning makes sense to you.

>
> Code looks solid to me - I think the upward tree traversal should be
> safe, as long as memcg is valid (since memcg holds reference to its
> parent IIRC).
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ