[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.00.0912301437420.3369@sister.anvils>
Date: Wed, 30 Dec 2009 16:03:37 +0000 (GMT)
From: Hugh Dickins <hugh.dickins@...cali.co.uk>
To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
cc: Peter Zijlstra <peterz@...radead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>, Ingo Molnar <mingo@...e.hu>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Darren Hart <dvhltc@...ibm.com>,
Ulrich Drepper <drepper@...il.com>
Subject: Re: [PATCH] futex: Fix ZERO_PAGE cause infinite loop
This was a good find, thanks a lot for discovering it.
I'm afraid I'd quite forgotten all the mm-implementation-dependence
that has accumulated (for good performance reasons) in kernel/futex.c.
(Luckily, given that KSM pages are also PageAnon, it will happen to
be working correctly on them: that's not your concern, but something
your report has reminded me to consider.)
On Fri, 25 Dec 2009, KOSAKI Motohiro wrote:
> > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > index 8e3c3ff..79b89cc 100644
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -254,6 +254,8 @@ again:
> > >
> > > page = compound_head(page);
> > > lock_page(page);
> > > + if (is_zero_pfn(page_to_pfn(page)))
> > > + goto anon_key;
If something like this or your replacment does go forward,
then I think that test is better inside the "if (!page->mapping)"
below. Admittedly that adds even more mm-dependence here (relying
on a zero page to have NULL page->mapping); but isn't page_to_pfn()
one of those functions which is trivial on many configs but expensive
on some? Better call it only in the rare case that it's needed.
Though wouldn't it be even better not to use is_zero_pfn() at all?
That was convenient in mm/memory.c because it had the pfn or pte right
at hand, but here a traditional (page == ZERO_PAGE(address)) would be
more efficient.
Which would save having to move is_zero_pfn() from mm/memory.c
to include/linux/mm.h - I'd prefer to keep it private if we can.
But for completeness, this would involve resurrecting the 2.6.19
MIPS move_pte(), which makes sure mremap() move doesn't interfere
with our assumptions. Something like
#define __HAVE_ARCH_MOVE_PTE
pte_t move_pte(pte_t pte, pgprot_t prot, unsigned long old_addr,
unsigned long new_addr)
{
if (pte_present(pte) && is_zero_pfn(pte_pfn(pte)))
pte = mk_pte(ZERO_PAGE(new_addr), prot);
return pte;
}
in arch/mips/include/asm/pgtable.h.
> > > if (!page->mapping) {
> > > unlock_page(page);
> > > put_page(page);
> > > @@ -268,6 +270,7 @@ again:
> > > * the object not the particular process.
> > > */
> > > if (PageAnon(page)) {
> > > + anon_key:
> > > key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
> > > key->private.mm = mm;
> > > key->private.address = address;
> >
> > Doesn't it make more sense to simply fail the futex op?
If we were to decide to fail the futex op on an unmodified ZERO_PAGE,
couldn't we simply delete the "again" label and just say
if (!page->mapping) {
/* Truncated file page, or a ZERO_PAGE */
unlock_page(page);
put_page(page);
return -EFAULT;
}
I don't know what the retry on that case was expected to achieve.
> >
> > What I mean is, all (?) futex ops assume userspace actually wrote
> > something to the address in question to begin with, therefore it can
> > never be the zero page.
> >
> > So it being the zero page means someone goofed up, and we should bail,
> > no?
I share Peter's wish to avoid cluttering this up with more special
case testing, but also your and Ulrich's wish to keep generality.
>
> Hm. probably we need to discuss more.
>
> Firstly, if we assume current glibc implimentation, you are right,
> we can assume userland always initialize the page explicitly before using
> futex. then we never seen zero page in futex.
>
> but, I don't think futex itself assume it now. at least man page
> doesn't describe such limilation. so, if you prefer bail and man fix,
> I'm acceptable maybe.
Here's another worry with the current futex implementation,
which might help me to decide which way to jump on the ZERO_PAGE.
Originally, a futex on a MAP_PRIVATE (!VM_MAYSHARE) area was necessarily
FUT_OFF_MMSHARED. Nowadays, with the get_user_pages_fast implementation,
we have no idea whether the vma is VM_MAYSHARE or not. So if a futex is
placed in a private area backed by a file, then it could be regarded as
FUT_OFF_INODE at futex_wait() time, but FUT_OFF_MMSHARED at futex_wake()
time.
Perhaps that's no problem at all, it's a long time since I was involved
with futexes, I think you and Peter will grasp the consequences quicker
than I shall.
But it seems no more improbable than the ZERO_PAGE case: some app
might place its futexes in the .data section of the executable,
which is a private mapping of the executable file.
If this case is also an issue, then perhaps we just need to update
the man page to say that whatever is responsible for initializing a
futex does need to write to it (or the page it's in) before it's used,
otherwise behaviour is undefined. (But we should then use the -EFAULT
patch above, we'd all prefer an error to busylooping.)
>
> I don't know any library except glibc use futex directly, or not.
> but we don't have any reason to assume futex is used only glibc.
> (plus, glibc might change its implementation in future release, perhaps)
I expect there are other implementations; but apparently none
that have yet reported this ZERO_PAGE behaviour as a problem.
...
>
> man page fix has another difficulty. in past days, zero pages was perfectly
> transparent from userland. then any man page don't describe "what's zero page".
> then some administrator don't know about zero page at all.
>
> So, if your main disliked point is goto statement, following patch is ok
> to me.
Although I'm as indecisive as Hamlet,
"goto or no goto, that is NOT the question":
I don't care, I think the goto in your original was fine.
>
> Finally, I agree this is really corner case issue. I can agree man page fix
> too. but I hope to know which point you dislike in my patch.
I just prefer not to have to add extra tests for the ZERO_PAGE
if we can get away without them.
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists