[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20101129093803.829F.A69D9226@jp.fujitsu.com>
Date: Mon, 29 Nov 2010 14:25:31 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: kosaki.motohiro@...fujitsu.com,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, pageexec@...email.hu,
Solar Designer <solar@...nwall.com>,
Eugene Teo <eteo@...hat.com>,
Brad Spengler <spender@...ecurity.net>,
Roland McGrath <roland@...hat.com>
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm
> On 11/25, Oleg Nesterov wrote:
> >
> > Great! I'll send the patch tomorrow.
> >
> > Even if you prefer another fix for 2.6.37/stable, I'd like to see
> > your review to know if it is correct or not (for backporting).
>
> OK, what do you think about the patch below?
Great. Thanks a lot.
>
> Seems to work, with this patch the test-case doesn't kill the
> system (sysctl_oom_kill_allocating_task == 0).
>
> I didn't dare to change !CONFIG_MMU case, I do not know how to
> test it.
>
> The patch is not complete, compat_copy_strings() needs changes.
> But, shouldn't it use get_arg_page() too? Otherwise, where do
> we check RLIMIT_STACK?
>
Because NOMMU doesn't have variable length argv. Instead it is still
using MAX_ARG_STRLEN as old MMU code.
32 pages hard coded argv limitation naturally prevent this nascent mm
issue.
> The patch asks for the cleanups. In particular, I think exec_mmap()
> should accept bprm, not mm. But I'd prefer to do this later.
>
> Oleg.
General request. Please consider to keep Brad's reported-by tag.
>
> include/linux/binfmts.h | 1 +
> fs/exec.c | 28 ++++++++++++++++++++++++++--
> 2 files changed, 27 insertions(+), 2 deletions(-)
>
> --- K/include/linux/binfmts.h~acct_exec_mem 2010-08-19 11:35:00.000000000 +0200
> +++ K/include/linux/binfmts.h 2010-11-25 20:19:33.000000000 +0100
> @@ -33,6 +33,7 @@ struct linux_binprm{
> # define MAX_ARG_PAGES 32
> struct page *page[MAX_ARG_PAGES];
> #endif
> + unsigned long vma_pages;
> struct mm_struct *mm;
> unsigned long p; /* current top of mem */
> unsigned int
> --- K/fs/exec.c~acct_exec_mem 2010-11-25 15:16:56.000000000 +0100
> +++ K/fs/exec.c 2010-11-25 20:20:49.000000000 +0100
> @@ -162,6 +162,25 @@ out:
> return error;
> }
>
> +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
Please move this function into #ifdef CONFIG_MMU. nommu code doesn't use it.
> +{
> + struct mm_struct *mm = current->mm;
> + long diff = pages - bprm->vma_pages;
I prefer to cast signed before assignment. It's safer more.
> +
> + if (!mm || !diff)
> + return;
> +
> + bprm->vma_pages += diff;
> +
> +#ifdef SPLIT_RSS_COUNTING
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> +#else
> + spin_lock(&mm->page_table_lock);
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> + spin_unlock(&mm->page_table_lock);
> +#endif
OK, looks good.
> +}
> +
> #ifdef CONFIG_MMU
>
> static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> @@ -186,6 +205,8 @@ static struct page *get_arg_page(struct
> unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
> struct rlimit *rlim;
>
> + acct_arg_size(bprm, size / PAGE_SIZE);
> +
> /*
> * We've historically supported up to 32 pages (ARG_MAX)
> * of argument strings even with small stacks
> @@ -1003,6 +1024,7 @@ int flush_old_exec(struct linux_binprm *
> /*
> * Release all of the old mmap stuff
> */
> + acct_arg_size(bprm, 0);
Why do we need this unacct here? I mean 1) if exec_mmap() is success,
we don't need unaccount at all 2) if exec_mmap() is failure, an epilogue of
do_execve() does unaccount thing.
> retval = exec_mmap(bprm->mm);
> if (retval)
> goto out;
> @@ -1426,8 +1448,10 @@ int do_execve(const char * filename,
> return retval;
>
> out:
> - if (bprm->mm)
> - mmput (bprm->mm);
> + if (bprm->mm) {
> + acct_arg_size(bprm, 0);
> + mmput(bprm->mm);
> + }
>
> out_file:
> if (bprm->file) {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists