[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140122192514.GA1779@redhat.com>
Date: Wed, 22 Jan 2014 20:25:14 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Alex Thorlton <athorlton@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Rik van Riel <riel@...hat.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Andy Lutomirski <luto@...capital.net>,
Al Viro <viro@...iv.linux.org.uk>,
Kees Cook <keescook@...omium.org>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH 0/2] mm->def_flags cleanups (Was: Change khugepaged to
respect MMF_THP_DISABLE flag)
On 01/22, Alex Thorlton wrote:
>
> At a glance, without testing, it looks like a good idea to me. By
> using def_flags, we leverage functionality that's already in place to
> achieve the same result. We don't need to add any new checks into the
> fault path or into khugepaged, since we're just leveraging the
> VM_HUGEPAGE/NOHUGEPAGE flag, which we already check for. We also get
> the behavior that you suggested (madvise is still respected, even with
> the new THP disable prctl set), for free with this method.
Yes, exactly.
> I like the idea, but I think that it should probably be a separate
> change from the other few cleanups that you proposed along with it,
Yes, sure, that is why I sent them separately,
> since
> they're somewhat unrelated to this particular issue. Do you agree?
Not really. Note that without 1/2 VM_NOHUGEPAGE won't survive after
exec. And without 2/2 madvise(MADV_HUGEPAGE) won't work after
PR_SET_THP_DISABLE.
But again, I think that these 2 simple cleanups make sense even without
PR_SET_THP_DISABLE.
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index ac1842e..eb8b0fc 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2029,6 +2029,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > if (arg2 || arg3 || arg4 || arg5)
> > return -EINVAL;
> > return current->no_new_privs ? 1 : 0;
> > + case PR_SET_THP_DISABLE:
> > + case PR_GET_THP_DISABLE:
> > + down_write(&me->mm->mmap_sem);
> > + if (option == PR_SET_THP_DISABLE) {
> > + if (arg2)
> > + me->mm->def_flags |= VM_NOHUGEPAGE;
> > + else
> > + me->mm->def_flags &= ~VM_NOHUGEPAGE;
> > + } else {
> > + error = !!(me->mm->flags && VM_NOHUGEPAGE);
>
> Should be:
>
> error = !!(me->mm->def_flags && VM_NOHUGEPAGE);
No, we need to return 1 if this bit is set ;)
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists