linux-kernel - Re: [PATCH memcg 3/3] memcg: handle memcg oom failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b618ac5c-e982-c4af-ecf3-564b8de52c8c@virtuozzo.com>
Date:   Thu, 21 Oct 2021 18:05:28 +0300
From:   Vasily Averin <vvs@...tuozzo.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>,
        Uladzislau Rezki <urezki@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Shakeel Butt <shakeelb@...gle.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel@...nvz.org
Subject: Re: [PATCH memcg 3/3] memcg: handle memcg oom failures

On 21.10.2021 14:49, Michal Hocko wrote:
> I do understand that handling a very specific case sounds easier but it
> would be better to have a robust fix even if that requires some more
> head scratching. So far we have collected several reasons why the it is
> bad to trigger oom killer from the #PF path. There is no single argument
> to keep it so it sounds like a viable path to pursue. Maybe there are
> some very well hidden reasons but those should be documented and this is
> a great opportunity to do either of the step.
> 
> Moreover if it turns out that there is a regression then this can be
> easily reverted and a different, maybe memcg specific, solution can be
> implemented.

Now I'm agree,
however I still have a few open questions.

1) VM_FAULT_OOM may be triggered w/o execution of out_of_memory()
for exampel it can be caused by incorrect vm fault operations, 
(a) which can return this error without calling allocator at all.
(b) or which can provide incorrect gfp flags and allocator can fail without execution of out_of_memory.
(c) This may happen on stable/LTS kernels when successful allocation was failed by hit into limit of legacy memcg-kmem contoller.
We'll drop it in upstream kernels, however how to handle it in old kenrels?

We can make sure that out_of_memory or alocator was called by set of some per-task flags.

Can pagefault_out_of_memory() send itself a SIGKILL in all these cases?

If not -- task will be looped. 
It is much better than execution of global OOM, however it would be even better to avoid it somehow.

You said: "We cannot really kill the task if we could we would have done it by the oom killer already".
However what to do if we even not tried to use oom-killer? (see (b) and (c)) 
or if we did not used the allocator at all (see (a))

2) in your patch we just exit from pagefault_out_of_memory(). and restart new #PF.
We can call schedule_timeout() and wait some time before a new #PF restart.
Additionally we can increase this delay in each new cycle. 
It helps to save CPU time for other tasks.
What do you think about?

Thank you,
	Vasily Averin