[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5fd73d87-3e4b-f793-1976-b937955663e3@i-love.sakura.ne.jp>
Date: Fri, 1 Feb 2019 05:59:55 +0900
From: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
Yong-Taek Lee <ytk.lee@...sung.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] mm, oom: Tolerate processes sharing mm with different
view of oom_score_adj.
On 2019/01/31 16:11, Michal Hocko wrote:
> On Thu 31-01-19 07:49:35, Tetsuo Handa wrote:
>> This patch reverts both commit 44a70adec910d692 ("mm, oom_adj: make sure
>> processes sharing mm have same view of oom_score_adj") and commit
>> 97fd49c2355ffded ("mm, oom: kill all tasks sharing the mm") in order to
>> close a race and reduce the latency at __set_oom_adj(), and reduces the
>> warning at __oom_kill_process() in order to minimize the latency.
>>
>> Commit 36324a990cf578b5 ("oom: clear TIF_MEMDIE after oom_reaper managed
>> to unmap the address space") introduced the worst case mentioned in
>> 44a70adec910d692. But since the OOM killer skips mm with MMF_OOM_SKIP set,
>> only administrators can trigger the worst case.
>>
>> Since 44a70adec910d692 did not take latency into account, we can "hold RCU
>> for minutes and trigger RCU stall warnings" by calling printk() on many
>> thousands of thread groups. Also, current code becomes a DoS attack vector
>> which will allow "stalling for more than one month in unkillable state"
>> simply printk()ing same messages when many thousands of thread groups
>> tried to iterate __set_oom_adj() on each other.
>>
>> I also noticed that 44a70adec910d692 is racy [1], and trying to fix the
>> race will require a global lock which is too costly for rare events. And
>> Michal Hocko is thinking to change the oom_score_adj implementation to per
>> mm_struct (with shadowed score stored in per task_struct in order to
>> support vfork() => __set_oom_adj() => execve() sequence) so that we don't
>> need the global lock.
>>
>> If the worst case in 44a70adec910d692 happened, it is an administrator's
>> request. Therefore, before changing the oom_score_adj implementation,
>> let's eliminate the DoS attack vector first.
>
> This is really ridiculous. I have already nacked the previous version
> and provided two ways around. The simplest one is to drop the printk.
> The second one is to move oom_score_adj to the mm struct. Could you
> explain why do you still push for this?
Dropping printk() does not close the race.
You must propose an alternative patch if you dislike this patch.
>
>> [1] https://lkml.kernel.org/r/20181008011931epcms1p82dd01b7e5c067ea99946418bc97de46a@epcms1p8
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
>> Reported-by: Yong-Taek Lee <ytk.lee@...sung.com>
>> Nacked-by: Michal Hocko <mhocko@...e.com>
>> ---
>> fs/proc/base.c | 46 ----------------------------------------------
>> include/linux/mm.h | 2 --
>> mm/oom_kill.c | 10 ++++++----
>> 3 files changed, 6 insertions(+), 52 deletions(-)
>>
Powered by blists - more mailing lists