lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJqJ8iiyVQxf1Kg_UKuRM_Zg6u4Hqb=DwpbOH_7CrAscAonD-g@mail.gmail.com>
Date: Wed, 16 Apr 2025 16:29:13 +0800
From: jingxiang zeng <jingxiangzeng.cas@...il.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Zhongkun He <hezhongkun.hzk@...edance.com>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Johannes Weiner <hannes@...xchg.org>, Roman Gushchin <roman.gushchin@...ux.dev>, 
	Jingxiang Zeng <linuszeng@...cent.com>, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, mhocko@...nel.org, 
	muchun.song@...ux.dev, kasong@...cent.com
Subject: Re: [External] Re: [RFC 2/5] memcontrol: add boot option to enable
 memsw account on dfl

On Sat, 12 Apr 2025 at 00:57, Michal Koutný <mkoutny@...e.com> wrote:
>
> On Thu, Apr 03, 2025 at 05:16:45PM +0800, jingxiang zeng <jingxiangzeng.cas@...il.com> wrote:
> > > We encountered an issue, which is also a real use case. With memory offloading,
> > > we can move some cold pages to swap. Suppose an application’s peak memory
> > > usage at certain times is 10GB, while at other times, it exists in a
> > > combination of
> > > memory and swap. If we set limits on memory or swap separately, it would lack
> > > flexibility—sometimes it needs 1GB memory + 9GB swap, sometimes 5GB
> > > memory + 5GB swap, or even 10GB memory + 0GB swap. Therefore, we strongly
> > > hope to use the mem+swap charging method in cgroupv2
>
> App's peak need determines memory.max=10G.
> The apparent flexibility is dependency on how much competitors the app
> has. It can run 5GB memory + 5GB swap with some competition or 1GB
> memory + 9 GB with different competition (more memory demanding).
> If you want to prevent faulty app to eating up all of swap for itself
> (like it's possible with memsw), you may define some memory.swap.max.
> (There's no unique correspondence between this and original memsw value
> since the cost of mem<->swap is variable.)
>
>
> > Yes, in the container scenario, if swap is enabled on the server and
> > the customer's container requires 10GB of memory, we only need to set
> > memory.memsw.limit_in_bytes=10GB, and the kernel can automatically
> > swap out part of the business container's memory to swap according to
> > the server's memory pressure, and it can be fully guaranteed that the
> > customer's container will not use more memory because swap is enabled
> > on the server.
>
> This made me consider various causes of the pressure:
>
> - global pressure
>   - it doesn't change memcg's total consuption (memsw.usage=const)
>   - memsw limit does nothing
> - self-memcg pressure
>   - new allocations against own limit and memsw.usage hits memsw.limit
>   - memsw.limit prevents new allocations that would extend swap
>   - achievable with memory.swap.max=0
> - ancestral pressure
>   - when sibling needs to allocate but limit is on ancestor
>   - similar to global pressure (memsw.usage=const), self memsw.limit
>     does nothing
>
> - or there is no outer pressure but you want to prevent new allocations
>   when something has been swapped out already
>   - swapped out amount is a debt
>     - memsw.limit behavior is suboptimal until the debt needs to be
>       repaid
>       - repay is when someone else needs the swap space
>
> The above is a free flow of thoughts but I'd condense such conversions:
> - memory.max := memory.memsw.limit_in_bytes
> - memory.swap.max := anything between 0 and memory.memsw.limit_in_bytes
>
> Did I fail to capture some mode where memsw limits were superior?
>
Hi, Michal

In fact, the memsw counter is mainly effective in proactive memory offload
scenarios.

For example, the current container memory usage is as follows:
memory.limit_in_bytes = 10GB
memory.usage_in_bytes = 9GB

Theoretically, through the memory.reclaim proactive reclaim interface, the
memory usage of [0GB, 9GB] can be reclaimed to the swap, so:
memory.limit_in_bytes = 10GB
memory.usage_in_bytes = 9GB - [0GB, 9GB]

In the case of proactive memory offload, the amount of memory that can be
reclaimed is determined by the container's PSI and other indicators. It is
difficult to set an accurate memory.swap.max value.
memory.swap.current = [0GB, 9GB]
memory.swap.max = ?

The memory space saved by swapping out to swap can continue to load
the operation of system components or more workloads.
memory.limit_in_bytes = 10GB
memory.usage_in_bytes = 9GB - [0GB, 9GB]
memory.swap.current = [0GB, 9GB]

The memory usage of memory.usage_in_bytes is reduced due to proactive
offload to swap, which will cause additional problems, such as:
1. There may be some memory leaks or abnormal imported network traffic
in the container, which may cause OOM to fail to trigger or be triggered late;
2. In the oversold scenario, if the container's memory requirement is 10GB,
the container's memory+swap should only use 10GB.

In the above scenario, the memsw counter is very useful:
memory.limit_in_bytes = 10GB
memory.usage_in_bytes = 9GB - [0GB, 9GB]

memory.memsw.limit_in_bytes = 10GB
memory.memsw.usage_in_bytes = 9GB

Above, thanks.
> Thanks,
> Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ