lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZDUkAWT59seiD8+8@dhcp22.suse.cz>
Date:   Tue, 11 Apr 2023 11:10:25 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Shaun Tancheff <shaun.tancheff@...il.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Shaun Tancheff <shaun.tancheff@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] memcg: Default value setting in memcg-v1

On Thu 06-04-23 16:14:50, Shaun Tancheff wrote:
> From: Shaun Tancheff <shaun.tancheff@....com>
> 
> Setting min, low and high values with memcg-v1
> provides bennefits for  users that are unable to update
> to memcg-v2.

min, low and high limits are cgroup v2 concepts which are not a fit for
v1 implementation. The primary reason why v2 interface has been created
was that existing v1 interfaces and internal constrains (most
notably soft limit and tasks in inter nodes for memcg) were not
reformable. It is really hard to define a proper semantic for memory
protection when inter node tasks can compete with hierarchy beneath.

> Setting min, low and high can be set in memcg-v1
> to apply enough memory pressure to effective throttle
> filesystem I/O without hitting memcg oom.

This is not a proper way to achieve that. As I've already state in the
previous submission of a similar patch
(20230330202232.355471-1-shaun.tancheff@...il.com), cgroup v1 dirty data
throttling has some downsides because it cannot effectively throttle
GFP_NOFS allocations. One way around that is to reduce the dirty data
limit to prevent from over dirty memcg LRUs. I would recommend to move
forward to cgroup v2 though.

> This can be enabled by setting the sysctl values:
>   vm.memcg_v1_min_default
>   vm.memcg_v1_low_default
>   vm.memcg_v1_high_default
>
> When a memory control group is newly crated the
> min, low and high values are set to percent of the
> maximum based on the min, low and high default
> values respectively.

This also looks like an anti-pattern in the cgroup world. For two
reasons. First of all min, low (reclaim protection) is hierarchical and
global default value makes a very little sense for anything than flat
hierarchies and even then it makes it really easy to misconfigure system
too easily.
Also percentage is a very suboptimal interface in general as the
granularity is just too coarse for anything than small limits.
 
> This resolves an issue with memory pressure when users
> initiate unbounded I/O on various file systems such as
> ext4, XFS and NFS.

Filesystems should still be controllable by dirty limits. This might
lead to a suboptimal IO throughput but this might be a better workaround
if you cannot afford to move to cgroup v2. V1 interface is considered
legacy and support is limited. New features are only added if there
absolutely is not other way around to keep legacy applications running.

HTH
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ