lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 31 Jul 2009 15:50:27 +0900 (JST) From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> Cc: kosaki.motohiro@...fujitsu.com, David Rientjes <rientjes@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, Rik van Riel <riel@...hat.com>, Paul Menage <menage@...gle.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org Subject: Re: [patch -mm v2] mm: introduce oom_adj_child > On Thu, 30 Jul 2009 12:05:30 -0700 (PDT) > David Rientjes <rientjes@...gle.com> wrote: > > > On Thu, 30 Jul 2009, KAMEZAWA Hiroyuki wrote: > > > > > > If you have suggestions for a better name, I'd happily ack it. > > > > > > > > > > Simply, reset_oom_adj_at_new_mm_context or some. > > > > > > > I think it's preferred to keep the name relatively short which is an > > unfortuante requirement in this case. I also prefer to start the name > > with "oom_adj" so it appears alongside /proc/pid/oom_adj when listed > > alphabetically. > > > But misleading name is bad. > > > > > > > > 2. More simple plan is like this, IIUC. > > > > > > > > > > fix oom-killer's select_bad_process() not to be in deadlock. > > > > > > > > > > > > > Alternate ideas? > > > > > > > At brief thiking. > > > > > > 1. move oom_adj from mm_struct to signal struct. or somewhere. > > > (see copy_signal()) > > > Then, > > > - all threads in a process will have the same oom_adj. > > > - vfork()'ed thread will inherit its parent's oom_adj. > > > - vfork()'ed thread can override oom_adj of its own. > > > > > > In other words, oom_adj is shared when CLONE_PARENT is not set. > > > > > > > Hmm, didn't we talk about signal_struct already? The problem with that > > approach is that oom_adj values represent a killable quantity of memory, > > so having multiple threads sharing the same mm_struct with one set to > > OOM_DISABLE and the other at +15 will still livelock because the oom > > killer can't kill either. > > > > > 2. rename mm_struct's oom_adj as shadow_oom_adj. > > > > > > update this shadow_oom_adj as the highest oom_adj among > > > the values all threads share this mm_struct have. > > > This update is done when > > > - mm_init() > > > - oom_adj is written. > > > > > > User's > > > # echo XXXX > /proc/<x>/oom_adj > > > is not necessary to be very very fast. > > > > > > I don't think a process which calls vfork() is multi-threaded. > > > > > > 3. use shadow_oom_adj in select_bad_process(). > > > > > > > Ideas 2 & 3 here seem to be a single proposal. The problem is that it > > still leaves /proc/pid/oom_score to be inconsistent with the badness > > scoring that the oom killer will eventually use since if it oom kills one > > task, it must kill all tasks sharing the same mm_struct to lead to future > > memory freeing. > > > yes. > > > Additionally, if you were to set one thread to OOM_DISABLE, storing the > > highest oom_adj value in mm_struct isn't going to help because > > oom_kill_task() will still require a tasklist scan to ensure no threads > > sharing the mm_struct are OOM_DISABLE and the livelock persists. > > > > Why don't you think select_bad_process()-> oom_kill_task() implementation is bad ? > IMHO, it's bad manner to fix an os-implementation problem by adding _new_ user > interface which is hard to understand. > > > > In other words, the issue here is larger than the inheritance of the > > oom_adj value amongst children, it addresses a livelock that neither of > > your approaches solve. The fix actually makes /proc/pid/oom_adj (and > > /proc/pid/oom_score) consistent with how the oom killer behaves. > > This oom_adj_child itself is not related to livelock problem. Don't make > the problem bigger than it is. > oom_adj_child itself is just a problem how to handle vfork(). I made my proposal patch today. this patch have following charactatistics. o per-process oom_adj (by signal_struct) o don't live-lock Please comment. Patch against 2.6.31-rc4 =========================== Subject: [PATCH] move oom_adj to task->signal test program ---------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define BUF_SIZE 128 void oom_adj_print(void) { FILE* file; char buf[BUF_SIZE]; file = fopen("/proc/self/oom_adj", "r"); if (!file) { perror("fopen"); exit(1); } fscanf(file, "%s\n", buf); printf("%s\n", buf); fclose(file); } void oom_adj_write(int value) { FILE* file; size_t ret; file = fopen("/proc/self/oom_adj", "w"); if (!file) { perror("fopen"); exit(1); } ret = fprintf(file, "%d", value); if (!ret) { perror("fprintf"); exit(1); } fclose(file); } int main(void) { int status; oom_adj_print(); oom_adj_write(1); oom_adj_print(); printf("vfork\n"); if (vfork() == 0) { /* child */ oom_adj_print(); oom_adj_write(2); oom_adj_print(); _exit(0); } wait(&status); oom_adj_print(); return 0; } test result: --------------------------------- % ./a.out 0 1 vfork 1 2 1 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> Reported-by: Paul Menage <menage@...gle.com> Cc: David Rientjes <rientjes@...gle.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> Cc: Rik van Riel <riel@...hat.com>, Cc: Andrew Morton <akpm@...ux-foundation.org>, --- fs/proc/base.c | 7 ++++--- include/linux/mm_types.h | 3 ++- include/linux/oom.h | 1 + include/linux/sched.h | 2 ++ kernel/exit.c | 2 ++ kernel/fork.c | 2 ++ mm/oom_kill.c | 14 +++++++++----- 7 files changed, 22 insertions(+), 9 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 3ce5ae9..c64499e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1008,7 +1008,7 @@ static ssize_t oom_adjust_read(struct file *file, char __user *buf, return -ESRCH; task_lock(task); if (task->mm) - oom_adjust = task->mm->oom_adj; + oom_adjust = task->signal->oom_adj; else oom_adjust = OOM_DISABLE; task_unlock(task); @@ -1046,12 +1046,13 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf, put_task_struct(task); return -EINVAL; } - if (oom_adjust < task->mm->oom_adj && !capable(CAP_SYS_RESOURCE)) { + if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) { task_unlock(task); put_task_struct(task); return -EACCES; } - task->mm->oom_adj = oom_adjust; + task->signal->oom_adj = oom_adjust; + task->mm->oom_adj_cached = OOM_CACHE_DEFAULT; task_unlock(task); put_task_struct(task); if (end - buffer == 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7acc843..f93f97f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -240,7 +240,8 @@ struct mm_struct { unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ - s8 oom_adj; /* OOM kill score adjustment (bit shift) */ + s8 oom_adj_cached; /* mirror from signal_struct->oom_adj. + in vfork case, multiple processes use the same mm. */ cpumask_t cpu_vm_mask; diff --git a/include/linux/oom.h b/include/linux/oom.h index a7979ba..a219480 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -3,6 +3,7 @@ /* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */ #define OOM_DISABLE (-17) +#define OOM_CACHE_DEFAULT (15) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 diff --git a/include/linux/sched.h b/include/linux/sched.h index 3ab08e4..e10b12b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -629,6 +629,8 @@ struct signal_struct { unsigned audit_tty; struct tty_audit_buf *tty_audit_buf; #endif + + s8 oom_adj; /* OOM kill score adjustment (bit shift) */ }; /* Context switch must be unlocked if interrupts are to be enabled */ diff --git a/kernel/exit.c b/kernel/exit.c index 869dc22..c741a45 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -48,6 +48,7 @@ #include <linux/fs_struct.h> #include <linux/init_task.h> #include <linux/perf_counter.h> +#include <linux/oom.h> #include <trace/events/sched.h> #include <asm/uaccess.h> @@ -688,6 +689,7 @@ static void exit_mm(struct task_struct * tsk) enter_lazy_tlb(mm, current); /* We don't want this task to be frozen prematurely */ clear_freeze_flag(tsk); + mm->oom_adj_cached = OOM_CACHE_DEFAULT; task_unlock(tsk); mm_update_next_owner(mm); mmput(mm); diff --git a/kernel/fork.c b/kernel/fork.c index 9b42695..b7cb474 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -62,6 +62,7 @@ #include <linux/fs_struct.h> #include <linux/magic.h> #include <linux/perf_counter.h> +#include <linux/oom.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -426,6 +427,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p) init_rwsem(&mm->mmap_sem); INIT_LIST_HEAD(&mm->mmlist); mm->flags = (current->mm) ? current->mm->flags : default_dump_filter; + mm->oom_adj_cached = OOM_CACHE_DEFAULT; mm->core_state = NULL; mm->nr_ptes = 0; set_mm_counter(mm, file_rss, 0); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 175a67a..eae2d78 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -58,7 +58,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) unsigned long points, cpu_time, run_time; struct mm_struct *mm; struct task_struct *child; - int oom_adj; + s8 oom_adj; task_lock(p); mm = p->mm; @@ -66,7 +66,10 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) task_unlock(p); return 0; } - oom_adj = mm->oom_adj; + + if (mm->oom_adj_cached < p->signal->oom_adj) + mm->oom_adj_cached = p->signal->oom_adj; + oom_adj = mm->oom_adj_cached; if (oom_adj == OOM_DISABLE) { task_unlock(p); return 0; @@ -307,7 +310,8 @@ static void dump_tasks(const struct mem_cgroup *mem) } printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n", p->pid, __task_cred(p)->uid, p->tgid, mm->total_vm, - get_mm_rss(mm), (int)task_cpu(p), mm->oom_adj, p->comm); + get_mm_rss(mm), (int)task_cpu(p), p->signal->oom_adj, + p->comm); task_unlock(p); } while_each_thread(g, p); } @@ -350,7 +354,7 @@ static int oom_kill_task(struct task_struct *p) task_lock(p); mm = p->mm; - if (!mm || mm->oom_adj == OOM_DISABLE) { + if (!mm || p->signal->oom_adj == OOM_DISABLE) { task_unlock(p); return 1; } @@ -381,7 +385,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, printk(KERN_WARNING "%s invoked oom-killer: " "gfp_mask=0x%x, order=%d, oom_adj=%d\n", current->comm, gfp_mask, order, - current->mm ? current->mm->oom_adj : OOM_DISABLE); + current->mm ? current->signal->oom_adj : OOM_DISABLE); cpuset_print_task_mems_allowed(current); task_unlock(current); dump_stack(); -- 1.6.0.GIT -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists