[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180313133946.GT12772@dhcp22.suse.cz>
Date: Tue, 13 Mar 2018 14:39:46 +0100
From: Michal Hocko <mhocko@...nel.org>
To: David Rientjes <rientjes@...gle.com>
Cc: Gaurav Kohli <gkohli@...eaurora.org>,
Andrew Morton <akpm@...ux-foundation.org>,
kirill.shutemov@...ux.intel.com,
Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org
Subject: Re: [patch] mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
On Wed 07-03-18 15:52:15, David Rientjes wrote:
> Since the 2.6 kernel, the oom killer has slightly biased away from
> CAP_SYS_ADMIN processes by discounting some of its memory usage in
> comparison to other processes.
>
> This has always been implicit and nothing exactly relies on the behavior.
>
> Gaurav notices that __task_cred() can dereference a potentially freed
> pointer if the task under consideration is exiting because a reference to
> the task_struct is not held.
>
> Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.
>
> If any CAP_SYS_ADMIN process would like to be biased against, it is always
> allowed to adjust /proc/pid/oom_score_adj.
>
> Reported-by: Gaurav Kohli <gkohli@...eaurora.org>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
This is simpler than playing reference counting tricks and whatnot.
Moreover I do agree that this heuristic is questionable on its own. The
bias is basically random and invisible to the userspace. We already have
a way to tune the same thing by oom_score_adj
Acked-by: Michal Hocko <mhocko@...e.com>
> ---
> mm/oom_kill.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -224,13 +224,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> mm_pgtables_bytes(p->mm) / PAGE_SIZE;
> task_unlock(p);
>
> - /*
> - * Root processes get 3% bonus, just like the __vm_enough_memory()
> - * implementation used by LSMs.
> - */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= (points * 3) / 100;
> -
> /* Normalize to oom_score_adj units */
> adj *= totalpages / 1000;
> points += adj;
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists