lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830907141752n5c701f9ap2a87df0a7f4f6ff3@mail.gmail.com>
Date:	Tue, 14 Jul 2009 17:52:28 -0700
From:	Paul Menage <menage@...gle.com>
To:	linux-kernel@...r.kernel.org
Cc:	mm-commits@...r.kernel.org, rientjes@...gle.com, mel@....ul.ie,
	npiggin@...e.de, riel@...hat.com
Subject: Re: + oom-move-oom_adj-value-from-task_struct-to-mm_struct.patch 
	added to -mm tree

On Mon, Jun 1, 2009 at 10:57 PM, <akpm@...ux-foundation.org> wrote:
>
> The patch titled
>     oom: move oom_adj value from task_struct to mm_struct
> has been added to the -mm tree.  Its filename is
>     oom-move-oom_adj-value-from-task_struct-to-mm_struct.patch

This patch represents an API/semantics change that's invisible from
userspace, unless userspace specifically checks for the old and new
behaviours to see which applies.

Is it intentional that it's now impossible to create a tree of
processes that will inherit a non-zero oom_adj? It appears to get
reset whenever you exec a child process.

Old behaviour:

[root@...alhost ~]# echo -1 > /proc/$$/oom_adj
[root@...alhost ~]# cat /proc/$$/oom_adj
-1
[root@...alhost ~]# bash
[root@...alhost ~]# cat /proc/$$/oom_adj
-1

New behaviour:

[root@...alhost ~]# echo -1 > /proc/$$/oom_adj
[root@...alhost ~]# cat /proc/$$/oom_adj
-1
[root@...alhost ~]# bash
[root@...alhost ~]# cat /proc/$$/oom_adj
0

>
> Before you just go and hit "reply", please:
>   a) Consider who else should be cc'ed
>   b) Prefer to cc a suitable mailing list as well
>   c) Ideally: find the original patch on the mailing list and do a
>      reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
> See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
> out what to do about this
>
> The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
>
> ------------------------------------------------------
> Subject: oom: move oom_adj value from task_struct to mm_struct
> From: David Rientjes <rientjes@...gle.com>
>
> The per-task oom_adj value is a characteristic of its mm more than the
> task itself since it's not possible to oom kill any thread that shares the
> mm.  If a task were to be killed while attached to an mm that could not be
> freed because another thread were set to OOM_DISABLE, it would have
> needlessly been terminated since there is no potential for future memory
> freeing.
>
> This patch moves oomkilladj (now more appropriately named oom_adj) from
> struct task_struct to struct mm_struct.  This requires task_lock() on a
> task to check its oom_adj value to protect against exec, but it's already
> necessary to take the lock when dereferencing the mm to find the total VM
> size for the badness heuristic.
>
> This fixes a livelock if the oom killer chooses a task and another thread
> sharing the same memory has an oom_adj value of OOM_DISABLE.  This occurs
> because oom_kill_task() repeatedly returns 1 and refuses to kill the
> chosen task while select_bad_process() will repeatedly choose the same
> task during the next retry.
>
> Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
> oom_kill_task() to check for threads sharing the same memory will be
> removed in the next patch in this series where it will no longer be
> necessary.
>
> Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
> these threads are immune from oom killing already.  They simply report an
> oom_adj value of OOM_DISABLE.
>
> Cc: Nick Piggin <npiggin@...e.de>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: Mel Gorman <mel@....ul.ie>
> Signed-off-by: David Rientjes <rientjes@...gle.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
>  Documentation/filesystems/proc.txt |   15 +++++++----
>  fs/proc/base.c                     |   19 ++++++++++++---
>  include/linux/mm_types.h           |    2 +
>  include/linux/sched.h              |    1
>  mm/oom_kill.c                      |   34 +++++++++++++++++----------
>  5 files changed, 50 insertions(+), 21 deletions(-)
>
> diff -puN Documentation/filesystems/proc.txt~oom-move-oom_adj-value-from-task_struct-to-mm_struct Documentation/filesystems/proc.txt
> --- a/Documentation/filesystems/proc.txt~oom-move-oom_adj-value-from-task_struct-to-mm_struct
> +++ a/Documentation/filesystems/proc.txt
> @@ -1003,11 +1003,13 @@ CHAPTER 3: PER-PROCESS PARAMETERS
>  3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
>  ------------------------------------------------------
>
> -This file can be used to adjust the score used to select which processes
> -should be killed in an  out-of-memory  situation.  Giving it a high score will
> -increase the likelihood of this process being killed by the oom-killer.  Valid
> -values are in the range -16 to +15, plus the special value -17, which disables
> -oom-killing altogether for this process.
> +This file can be used to adjust the score used to select which processes should
> +be killed in an out-of-memory situation.  The oom_adj value is a characteristic
> +of the task's mm, so all threads that share an mm with pid will have the same
> +oom_adj value.  A high value will increase the likelihood of this process being
> +killed by the oom-killer.  Valid values are in the range -16 to +15 as
> +explained below and a special value of -17, which disables oom-killing
> +altogether for threads sharing pid's mm.
>
>  The process to be killed in an out-of-memory situation is selected among all others
>  based on its badness score. This value equals the original memory size of the process
> @@ -1021,6 +1023,9 @@ the parent's score if they do not share
>  are the prime candidates to be killed. Having only one 'hungry' child will make
>  parent less preferable than the child.
>
> +/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
> +oom-killing already.
> +
>  /proc/<pid>/oom_score shows process' current badness score.
>
>  The following heuristics are then applied:
> diff -puN fs/proc/base.c~oom-move-oom_adj-value-from-task_struct-to-mm_struct fs/proc/base.c
> --- a/fs/proc/base.c~oom-move-oom_adj-value-from-task_struct-to-mm_struct
> +++ a/fs/proc/base.c
> @@ -1006,7 +1006,12 @@ static ssize_t oom_adjust_read(struct fi
>
>        if (!task)
>                return -ESRCH;
> -       oom_adjust = task->oomkilladj;
> +       task_lock(task);
> +       if (task->mm)
> +               oom_adjust = task->mm->oom_adj;
> +       else
> +               oom_adjust = OOM_DISABLE;
> +       task_unlock(task);
>        put_task_struct(task);
>
>        len = snprintf(buffer, sizeof(buffer), "%i\n", oom_adjust);
> @@ -1035,11 +1040,19 @@ static ssize_t oom_adjust_write(struct f
>        task = get_proc_task(file->f_path.dentry->d_inode);
>        if (!task)
>                return -ESRCH;
> -       if (oom_adjust < task->oomkilladj && !capable(CAP_SYS_RESOURCE)) {
> +       task_lock(task);
> +       if (!task->mm) {
> +               task_unlock(task);
> +               put_task_struct(task);
> +               return -EINVAL;
> +       }
> +       if (oom_adjust < task->mm->oom_adj && !capable(CAP_SYS_RESOURCE)) {
> +               task_unlock(task);
>                put_task_struct(task);
>                return -EACCES;
>        }
> -       task->oomkilladj = oom_adjust;
> +       task->mm->oom_adj = oom_adjust;
> +       task_unlock(task);
>        put_task_struct(task);
>        if (end - buffer == 0)
>                return -EIO;
> diff -puN include/linux/mm_types.h~oom-move-oom_adj-value-from-task_struct-to-mm_struct include/linux/mm_types.h
> --- a/include/linux/mm_types.h~oom-move-oom_adj-value-from-task_struct-to-mm_struct
> +++ a/include/linux/mm_types.h
> @@ -240,6 +240,8 @@ struct mm_struct {
>
>        unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
>
> +       s8 oom_adj;     /* OOM kill score adjustment (bit shift) */
> +
>        cpumask_t cpu_vm_mask;
>
>        /* Architecture-specific MM context */
> diff -puN include/linux/sched.h~oom-move-oom_adj-value-from-task_struct-to-mm_struct include/linux/sched.h
> --- a/include/linux/sched.h~oom-move-oom_adj-value-from-task_struct-to-mm_struct
> +++ a/include/linux/sched.h
> @@ -1151,7 +1151,6 @@ struct task_struct {
>         * a short time
>         */
>        unsigned char fpu_counter;
> -       s8 oomkilladj; /* OOM kill score adjustment (bit shift). */
>  #ifdef CONFIG_BLK_DEV_IO_TRACE
>        unsigned int btrace_seq;
>  #endif
> diff -puN mm/oom_kill.c~oom-move-oom_adj-value-from-task_struct-to-mm_struct mm/oom_kill.c
> --- a/mm/oom_kill.c~oom-move-oom_adj-value-from-task_struct-to-mm_struct
> +++ a/mm/oom_kill.c
> @@ -58,6 +58,7 @@ unsigned long badness(struct task_struct
>        unsigned long points, cpu_time, run_time;
>        struct mm_struct *mm;
>        struct task_struct *child;
> +       int oom_adj;
>
>        task_lock(p);
>        mm = p->mm;
> @@ -65,6 +66,7 @@ unsigned long badness(struct task_struct
>                task_unlock(p);
>                return 0;
>        }
> +       oom_adj = mm->oom_adj;
>
>        /*
>         * The memory size of the process is the basis for the badness.
> @@ -148,15 +150,15 @@ unsigned long badness(struct task_struct
>                points /= 8;
>
>        /*
> -        * Adjust the score by oomkilladj.
> +        * Adjust the score by oom_adj.
>         */
> -       if (p->oomkilladj) {
> -               if (p->oomkilladj > 0) {
> +       if (oom_adj) {
> +               if (oom_adj > 0) {
>                        if (!points)
>                                points = 1;
> -                       points <<= p->oomkilladj;
> +                       points <<= oom_adj;
>                } else
> -                       points >>= -(p->oomkilladj);
> +                       points >>= -(oom_adj);
>        }
>
>  #ifdef DEBUG
> @@ -251,8 +253,12 @@ static struct task_struct *select_bad_pr
>                        *ppoints = ULONG_MAX;
>                }
>
> -               if (p->oomkilladj == OOM_DISABLE)
> +               task_lock(p);
> +               if (p->mm && p->mm->oom_adj == OOM_DISABLE) {
> +                       task_unlock(p);
>                        continue;
> +               }
> +               task_unlock(p);
>
>                points = badness(p, uptime.tv_sec);
>                if (points > *ppoints || !chosen) {
> @@ -304,8 +310,7 @@ static void dump_tasks(const struct mem_
>                }
>                printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d     %3d %s\n",
>                       p->pid, __task_cred(p)->uid, p->tgid, mm->total_vm,
> -                      get_mm_rss(mm), (int)task_cpu(p), p->oomkilladj,
> -                      p->comm);
> +                      get_mm_rss(mm), (int)task_cpu(p), mm->oom_adj, p->comm);
>                task_unlock(p);
>        } while_each_thread(g, p);
>  }
> @@ -367,8 +372,12 @@ static int oom_kill_task(struct task_str
>         * Don't kill the process if any threads are set to OOM_DISABLE
>         */
>        do_each_thread(g, q) {
> -               if (q->mm == mm && q->oomkilladj == OOM_DISABLE)
> +               task_lock(q);
> +               if (q->mm == mm && q->mm && q->mm->oom_adj == OOM_DISABLE) {
> +                       task_unlock(q);
>                        return 1;
> +               }
> +               task_unlock(q);
>        } while_each_thread(g, q);
>
>        __oom_kill_task(p, 1);
> @@ -393,10 +402,11 @@ static int oom_kill_process(struct task_
>        struct task_struct *c;
>
>        if (printk_ratelimit()) {
> -               printk(KERN_WARNING "%s invoked oom-killer: "
> -                       "gfp_mask=0x%x, order=%d, oomkilladj=%d\n",
> -                       current->comm, gfp_mask, order, current->oomkilladj);
>                task_lock(current);
> +               printk(KERN_WARNING "%s invoked oom-killer: "
> +                       "gfp_mask=0x%x, order=%d, oom_adj=%d\n",
> +                       current->comm, gfp_mask, order,
> +                       current->mm ? current->mm->oom_adj : OOM_DISABLE);
>                cpuset_print_task_mems_allowed(current);
>                task_unlock(current);
>                dump_stack();
> _
>
> Patches currently in -mm which might be from rientjes@...gle.com are
>
> origin.patch
> linux-next.patch
> cpusets-restructure-the-function-cpuset_update_task_memory_state.patch
> cpusets-update-tasks-page-slab-spread-flags-in-time.patch
> cpusetmm-update-tasks-mems_allowed-in-time.patch
> cpusetmm-update-tasks-mems_allowed-in-time-fix.patch
> cpusetmm-update-tasks-mems_allowed-in-time-cleanup.patch
> page-allocator-use-a-pre-calculated-value-instead-of-num_online_nodes-in-fast-paths-do-not-override-definition-of-node_set_online-with-macro.patch
> mm-setup_per_zone_inactive_ratio-do-not-call-for-int_sqrt-if-not-needed.patch
> mm-setup_per_zone_inactive_ratio-fix-comment-and-make-it-__init.patch
> page-allocator-warn-if-__gfp_nofail-is-used-for-a-large-allocation.patch
> mm-pm-freezer-disable-oom-killer-when-tasks-are-frozen.patch
> oom-move-oom_adj-value-from-task_struct-to-mm_struct.patch
> oom-avoid-unnecessary-mm-locking-and-scanning-for-oom_disable.patch
> memcg-add-file-based-rss-accounting.patch
> memcg-add-file-based-rss-accounting-fix-mem_cgroup_update_mapped_file_stat-oops.patch
>
> --
> To unsubscribe from this list: send the line "unsubscribe mm-commits" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ