linux-kernel - Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b5b1944d-846b-3212-fe4a-f10f5fcb87d7@linux.intel.com>
Date:   Thu, 25 Feb 2021 14:48:58 -0800
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Ying Huang <ying.huang@...el.com>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on
 usage excess



On 2/24/21 3:53 AM, Michal Hocko wrote:
> On Mon 22-02-21 11:48:37, Tim Chen wrote:
>>
>>
>> On 2/22/21 11:09 AM, Michal Hocko wrote:
>>
>>>>
>>>> I actually have tried adjusting the threshold but found that it doesn't work well for
>>>> the case with unenven memory access frequency between cgroups.  The soft
>>>> limit for the low memory event cgroup could creep up quite a lot, exceeding
>>>> the soft limit by hundreds of MB, even
>>>> if I drop the SOFTLIMIT_EVENTS_TARGET from 1024 to something like 8.
>>>
>>> What was the underlying reason? Higher order allocations?
>>>
>>
>> Not high order allocation.
>>
>> The reason was because the run away memcg asks for memory much less often, compared
>> to the other memcgs in the system.  So it escapes the sampling update and
>> was not put onto the tree and exceeds the soft limit
>> pretty badly.  Even if it was put onto the tree and gets page reclaimed below the
>> limit, it could escape the sampling the next time it exceeds the soft limit.
> 
> I am sorry but I really do not follow. Maybe I am missing something
> obvious but the the rate of events (charge/uncharge) shouldn't be really
> important. There is no way to exceed the limit without charging memory
> (either a new or via task migration in v1 and immigrate_on_move). If you
> have SOFTLIMIT_EVENTS_TARGET 8 then you should be 128 * 8 events to
> re-evaluate. Huge pages can make the runaway much bigger but how it
> would be possible to runaway outside of that bound.


Michal,

Let's take an extreme case where memcg 1 always generate the
first event and memcg 2 generates the rest of 128*8-1 events
and the pattern repeat.  The update tree happens on the 128*8th event
so memcg 1 did not trigger update tree.  In this case we will
keep missing memcg 1's event and not put memcg 1 on the tree.

Something like this pattern of memory events


cg1 cg2 cg2 cg2 ....cg2 cg1 cg2 cg2 cg2....cg2 cg1 cg2 .....
                     ^                      ^
		  update tree              update tree

Of course in real life the update events are random in nature.
However, due to the low occurrence of memcg 1 event, we can miss
updating it for a long time due to its lower probability of occurrence.

> 
> Btw. do we really need SOFTLIMIT_EVENTS_TARGET at all? Why cannot we
> just stick with a single threshold? mem_cgroup_update_tree can be made
> a effectivelly a noop when there is no soft limit in place so overhead
> shouldn't matter for the vast majority of workloads.
> 

I think there are two limits because the original code wants
memc_cgroup_threshold to be updated more frequently than the
soft_limit_tree.  The soft limit tree update is more costly.

Tim