linux-kernel - Re: [PATCH memcg] memcg: prohibit unconditional exceeding the limit of dying tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YTtx3toUOMLXk4GZ@dhcp22.suse.cz>
Date:   Fri, 10 Sep 2021 16:55:26 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Vasily Averin <vvs@...tuozzo.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH memcg] memcg: prohibit unconditional exceeding the limit
 of dying tasks

On Fri 10-09-21 16:20:58, Vasily Averin wrote:
> On 9/10/21 4:04 PM, Tetsuo Handa wrote:
> > On 2021/09/10 21:39, Vasily Averin wrote:
> >> The kernel currently allows dying tasks to exceed the memcg limits.
> >> The allocation is expected to be the last one and the occupied memory
> >> will be freed soon.
> >> This is not always true because it can be part of the huge vmalloc
> >> allocation. Allowed once, they will repeat over and over again.
> >> Moreover lifetime of the allocated object can differ from
> >> In addition the lifetime of the dying task.
> > 
> > Can't we add fatal_signal_pending(current) test to vmalloc() loop?

We can and we should.

> 1) this has been done in the past but has been reverted later.

The reason for that should be addressed IIRC.

> 2) any vmalloc changes will affect non-memcg allocations too.
>  If we're doing memcg-related checks it's better to do it in one place.

I think those two things are just orthogonal. Bailing out from vmalloc
early sounds reasonable to me on its own. Allocating a large thing that
is likely to go away with the allocating context is just a waste of
resources and potential reason to disruptions to others.

> 3) it is not vmalloc-only issue. Huge number of  kmalloc page allocations 
> from N concurrent threads will lead to the same problem. 

Agreed. I do not think it is viable to sprinkle fatal_signal_pending or
similar checks all over the code. This should better be limited to
allocators and the charging function.

Our assumption that is described in the code simply doesn't hold and it
is close to impossible to check all charging paths to bail out properly
so I think we should just remove that optimistic attitude and do not
force charges unless that is absolutely necessary (e.g. __GFP_NOFAIL) or
impractical (e.g. atomic allocations) and bound.

I didn't get to review the patch yet. This is a tricky area with many
unobvious corner cases. I will try find some time next week.

Thanks!
-- 
Michal Hocko
SUSE Labs