[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1311289053.25044.550.camel@pasglop>
Date: Fri, 22 Jul 2011 08:57:33 +1000
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Shan Hai <haishan.bai@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, paulus@...ba.org,
tglx@...utronix.de, walken@...gle.com, dhowells@...hat.com,
cmetcalf@...era.com, tony.luck@...el.com,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC/PATCH] mm/futex: Fix futex writes on archs with SW
tracking of dirty & young
On Fri, 2011-07-22 at 08:52 +1000, Benjamin Herrenschmidt wrote:
> > um, what problem. There's no description here of the user-visible
> > effects of the bug hence it's hard to work out what kernel version(s)
> > should receive this patch.
>
> Shan could give you an actual example (it was in the previous thread),
> but basically, livelock as the kernel keeps trying and trying the
> in_atomic op and never resolves it.
>
> > What kernel version(s) should receive this patch?
>
> I haven't dug. Probably anything it applies on as far as we did that
> trick of atomic + gup() for futex.
Oops, I just realize I didn't document the problem at all in the
changelog .. sorry. I meant to say:
On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in
the PTE.
Additionally, some architectures might lazily flush the TLB when
relaxing write protection (by doing only a local flush), and expect a
fault to invalidate the stale entry if it's still present on another
processor.
The futex code assumes that if the "in_atomic()" access -EFAULT's, it
can "fix it up" by causing get_user_pages() which would then be
equivalent to taking the fault.
However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.
Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.
Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.
Cheers,
Ben.
> > > since I'm
> > > starting to have the nasty feeling that you are hitting what is
> > > somewhat a subtly different issue or my previous patch should
> > > have worked (but then I might have done a stupid mistake as well)
> > > but let us know anyway.
> >
> > I assume that Shan reported the secret problem so I added the
> > reported-by to the changelog.
>
> He did :-) Shan, care to provide a rough explanation of what you
> observed ?
>
> Also Russell confirmed that ARM should be affected as well.
>
> Cheers,
> Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists